Hacker News new | past | comments | ask | show | jobs | submit login
HyperCore Linux: A tiny portable Linux designed for reproducibility (hyperos.io)
186 points by jimmcslim on Sept 10, 2015 | hide | past | favorite | 49 comments



I like this idea, but I do not really understand who it is aimed at. The focus seems to be on reproducibility of scientific experiments and code, which is great! Many existing code artifacts are WOGSL (Works On Graduate Student's Laptop) which is the CS equivalent of "runs when parked".

So, let's break down the fields of CS for which this should be applicable:

* Systems: This won't work except for the few systems projects that are entirely in-RAM AND will work on tinycore's kernel version

* ML: This, I can see, especially with the seeming focus on dataset management. Much ML is compute-bound and the overhead of using the FUSE FS's is hopefully negligible.

So, is this focused on ML and ML-using code and experiments? If so, I think that should be clarified. I think a lot of systems folk will be (rightly or wrongly) turned away from it due to the seeming overhead of the various hyper* extensions. Not to mention that they are all written in Node/JS (Again, rightly or wrongly, many systems folk will not want to run their stuff on platforms written in JS)

I like the direction this project can go, but there seems to be a lack of focus or direction in your mission right now.


> So, let's break down the fields of CS for which this should be applicable:

> So, is this focused on ML and ML-using code and experiments?

Your completely missing the point. Please look into 'Computational Science' (or Scientific Computing, or Numerical Analysis), that applies to 80%+ of all disciplines that exist today (e.g., computational physics, comp. biology, comp. economics, comp. aspects of engineering disciplines, the list goes on).


Yup, I can see it for computational experiments or "applied CS" fields. I realized this soon after I posted the comment, but I didn't bother to update my comment.

However, this still isn't clear in their website. I will give them the benefit of the doubt since they are early in their project, but I think it would behoove them to nail down their mission sooner rather than later.

This is probably what I get being in the CS bubble. =)


Well to their credit, they did mention "scientific research reproducibility" which is a very well known phrase in computational circles.

But I agree, it would help if they expand on this from the pure CS point of view. Especially if they mention things like containers, CS people would be interested in finding out what they're up to.


I guess you could also say that CS is one of those "applied math" fields :)

Seriously though, this kind platform is a critical component in scientific reproducibility. The dream is that we can have code, data, and the results of the composition of the two in the same revision control system. A minimal layer to allow the execution of linux software would support the use of legacy code and binaries in this new platform. Javascript has its advantages, but it's a waste to build a data RCS and require all functions on the data to be written in it.

And to go a bit further, it's not just for science. For example, you could write a HN clone in dat. I could fork it and get both your code and all the posts.


I was hoping this was going to be a linux distro built reproducibly (as in same binary builds given same compiler toolchain) but was disappointed..


I'm really curious as to what exactly you mean by this...does the same code through the same compiler not reliably produce the same binary? I know very little about actual compiler mechanics, but non-deterministic compilation seems really strange to me.


It is super strange and as a huge amount of undesirable consequences.

If you compile a "hello world" type C program you should get the same binary when you compile it again provided your toolchain (C library, C compiler, linker etc) are all the same.

However certain C macros like __DATE__ make the binary change (in this case based on the time of compile). Additionally sometimes environment variables like your working directory and your username get into the binary.

Why is this bad?

If the build server for Debian gets hacked or if a developer's machine gets hacked (for some projects), the hackers can modify the binaries. If the program is not reproducible then there is no way to tell that something has gone amiss. If the program can be built reproducibly, someone else can build the code, produce the same binary, and validate it.

This is more scarier in the case of a "Ken Thompson" style hack, where the C compiler binary is modified so that it compiles normally but inserts backdoors in certain libraries, and also inserts its modifications whenever it is building another C compiler.

If the "Ken Thompson" style hack is ever pulled off on a linux distro, there would be no real way to tell without analysing the binaries.

Provided your initial C compiler is good. Having a chain of reproducible builds where each build produces the same binary would prevent against this. Currently we are just producing random binaries and relying on trust which is horrible.


Another case is when you have to release a maintenance fix of a product released a few years ago.

You likely don't have the old toolchain installed, so you reinstall everything, pull a VM, whatever, and then rebuild the last official version.

I would be so much more comfortable if what you just rebuilt had the same md5 that the one deployed on the field. Because even before applying and testing the patch you are not absolutely sure you have rebuilt exactly the same application.


Object code often has the date/time encoded in it, for example.

Debian is updating a lot of their compile/build scripts to make things 100% byte-for-byte identical and you can see that while there isn't a lot of work involved it's still going to take a while to hand-update many thousands of pacakges:

https://wiki.debian.org/ReproducibleBuilds


It depends, technically a compiler output should be deterministic, but apparently there are a few that aren't, one example is any compiler built on Roslyn[1].

Although I think he might have been pointing out the problem of reproducible builds[2].

[1] https://github.com/dotnet/roslyn/issues/372

[2] https://wiki.debian.org/ReproducibleBuilds/About


You should look at NixOS[1] and GuixSD[2].

[1]: https://nixos.org

[2]: http://www.gnu.org/software/guix


It may also be worth mentioning Baserock[1] here,

[1]: http://wiki.baserock.org/


> sudo linux boot

  /usr/local/lib/node_modules/linux/cli.js:60
    fs.accessSync(keyPath)
       ^
  TypeError: Object #<Object> has no method 'accessSync'


just fixed this. reinstall and try again


> sudo linux boot Error: Timed out waiting for linux to boot

thats because xhyve fails with

  vmx_init: processor not supported by Hypervisor.framework
  Unable to create VM (-85377018)
might be better to output the error from xhyve right away


`npm install linux`


Right? Bold move, taking that as the npm package name listing.


I think they should rename tbh. They are not 'linux'


Nor is Karma the universal force of cause and effect commonly called "karma," but somehow we all get by with npm install karma.


tbh, It's far different to...

- use an abstract concept as a metaphor/marketing for your product

than

- use product "L" that has been influencial and important historically, currently and likely well into the future as your Product's dependency.

- Then market your product as product "L" on one of the most important package managers for startups, enterprises, hobbyists.

They have but a single distrobution. They are not "Linux". They just are an example of it


I agree. It will be better as npm install hypercore or something.


honestly, i'm surprised it wasn't taken before


The naming annoys me so does the title, it is not a Linux distribution, I stopped reading when i got to the npm part, I assume this is done in JS?

Please be explicit and say that this is NOT a linux distribution, and in all seriousess I don't understand how this can be called linux, can someone explain me?


I fully support this.


it's sort of a thug life moment for javascript.


What speed stats would people be getting with this..? The idea of Node managing a hypervisor linux VM somehow seems unrealistic in performance terms - but I might be hugely prejudiced on this so who knows.


I don't see where it has anything to do with Node. It looks like it's using NPM to install it, but the actual running of the VM looks like it goes through the standard hypervisor of the host OS. NPM is technically separate from Node (though yes, the majority of its use is for Node modules). If it is using Node for anything, I can only imagine it is as a replacement for shell scripting.

Interesting, using NPM. Now that I think about it, it's the only package manager that I know of that runs on and is commonly used on all 3 major OSes.


You mean like pip, easy_install, `go get`, gem, or PPM? How about even Tarballs?

/s kids these days, Javascript this, node that...


Or, you could just go to the lowest common denominator, and use Make. It's not like that problem space hasn't been thoroughly explored for decades. For any problem you would have I imagine FreeBSD encountered and surpassed it in the ports system a long time ago.


Except, you know, Windows.


True, but for Windows it's no worse than installing some other package manager that isn't installed automatically.

Which is all of them, except for windows update.

Except for if you have Microsoft Visual Studio installed, in which case you probably have a make system, nmake.

Unfortunately, nmake isn't quite compatible with regular Makefiles.

Luckily, there's plenty of resources for how to set up Makefiles so they work for both nmake and cmake.

But since make is so simple, it's been compiled and available for windows for decades,so just shipping it with the makefile for windows is probably simplest.


Tell me where your lawn is so I can get off it!


It looks like the work is done by Hypervisor.framework and xhyve that are written in C. Hitting the FUSE filesystem won't be fast, but then neither was AuFS.


Seems like a competitor for Vagrant as much as anything else. Given the ubiquity of Virtualbox and portability of its images, it seems like it will be the tool most compared.

That or docker toolbox.



"15. Rule of Optimization: Prototype before polishing. Get it working before you optimize it."

The Dat team has a solid track record creating separate packages for binaries; they're responsible for maintaining Fuse, Electron and LevelDB packages on NPM. Optimizations are likely to follow.


Fair enough.


For the first time in quite a while I am actually tempted to update my mac to Yosemite. It isn't explicit, but I am fairly certain it won't work in older versions of osx.


Yes, Yosemite was the first version to come with Hypervisor.framework


Also check out NixOS if you want a bonefide linux distribution: http://nixos.org/


why is this using npm and not brew?


this is not osx specific. its gonna be installable on other platforms as well (linux/osx/windows)


Why does this use npm instead of a tarball?


They are hoping to support Hyper-V soon (open task).


yeah, I think im not alone prefering npm for (cross-platform) node packages and homebrew for (osx-specific) general-purpose utilities


can we stop with .js already?


Care to explain why anyone should listen to you in this regard?


?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: