Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Gorgonia: a library like Theano or TensorFlow, mainly written in Go (chewxy.com)
180 points by boyter on Sept 19, 2016 | hide | past | favorite | 53 comments


I don't see the point of writing a machine learning package in Go.

Go is designed for system programming, which is the opposite of scripting. In scientific computing, interactivity is an important feature. Softwares like MATLAB and IPython(now Jupyter) become popular because of this. Go doesn't have a nice REPL like IPython, and it never will. Besides, Go doesn't allow operator overload, which leads to verbose programs. Compare `c = a @ b` with `c := Must(Mul(a, b))`, the former Python statement is much cleaner than the Go one.

Usually, researchers use MATLAB, R, and Python for prototyping and training. The final model will be put into production by using something like TensorFlow Serving or by hand written C++ for performance. Even if the final product uses Go for serving, the Go program mostly runs the model via cgo or RPC.

Write in C/C++, and wrap the library in Python and Go is a much saner option. TensorFlow and MXNet are both good examples.


I've been

> scientific computing

for quite some time, and I've the value of a REPL steadily in decline: what's far more useful is a tested code snippet or function & a debugger, occasionally.

Also, if a scientific stack were available in Go, I'd jump on it, because deploying Go is so much simpler than Python. Python's deployment story for scientific software often starts with either )a) create the perfect storm of dependency versions or (b) use Anaconda. Even MATLAB is way better in this regard.


In fairness, "use Anaconda" usually nullifies this dependency / deployment problem completely.


The "perfect storm of dependency versions" is a problem in Go as well though?


Not in practice. The vendoring solution that they are standardizing on makes versioning issues very manageable.


Which solution is this? Do you have any links or more information about that spesifically?



out of interest, what do you find lacking in https://github.com/gonum offering as a scientific stack?


I agree with you in general. However, I contend that people are lazy. Part of the reason why I wrote Gorgonia was because I spent waay too long trying to deploy theano on the cloud (this was circa 2 years ago).

Also, thanks for informing me about MXNet. I've never seen that before. Kinda cool.


> I don't see the point of writing a machine learning package in Go.

If nothing else, to prove a point that there are solid alternatives to C and C++, in memory safe languages.


If the point is feasibility, Java has shown us that for a long time...unless we consider the null pointer evidence of something not memory safe.


Not only feasibility, but having a GC enabled programming language with AOT compilation to native code (until Java 10, you will need to buy such compilers as the FOSS aren't as good).

After all the OP mentioned that only C and C++ were valid alternatives, hence why it is so important to implement such tools in other safer languages.

If people don't do it, as usual there will be a myth that only those languages are able to produce such type of applications.

Similar to the myth that C was the first systems programming language, when many of us were using something else before UNIX got widespread in the industry.


This is starting to seem fairly specific -- machine learning with memory safe languages that have AOT compilation to native code. It's not even, memory-safe languages that are performant -- because Java would totally match that requirement.

What I'm suggesting here is that there is nothing really remarkable about it, from either a research or implementation perspective, given the achievements of the Java community in recent years.


Interactivity can be an important feature for scientific programming, but so is having correct underlying algorithms that are implemented in a clean, composable way. Go is a good language for such implementations.

> Go doesn't have a nice REPL like IPython, and it never will.

https://github.com/gopherds/gophernotes


> Go doesn't have a nice REPL like IPython, and it never will.

https://github.com/motemen/gore

There are a few others too.


It's not a real repl. It just records everything you've typed, puts it in a file, and compiles and runs it. It does that every time you add a new line. It's a huge hack and doesn't behave like you'd expect at all.



Yes, but that's not exactly a REPL, is it? It's just an incomplete tool that may someday be used to create a REPL.

Go is not set up to have a REPL; even if someone develops one someday it's always going to be limited compared to a language that was designed to do it. REPLs are actually pretty hard; even some languages designed with them from day one (or at least very early) like Haskell still come with an interesting list of caveats about the REPL.


I beg to differ. See: https://github.com/sbinet/igo

It is a true REPL (not like "gore", mentioned somewhere in that thread). It doesn't implement the whole Go specs, but there's AFAICT nothing in the Go specs that prevent `igo` to implement the whole Go specs. It's "just" work. Moreover, with the introduction of the "buildmode=plugin" in the yet-to-be-released Go-1.8, import-ing packages on the fly should be easy implementable.

("my" new vehicle to provide a true REPL for Go is at https://github.com/go-interpreter)


If you have to link to a different project, then the link I was saying is not a REPL is not a REPL, no?

Further, I'd suggest waiting for one of these projects to complete before you confidently declare that the REPL will be great. I didn't say it's impossible, I said "it's always going to be limited compared to a language that was designed to do it". I looked into what it would take to make a REPL around the Go 1.2 timeframe. There are many issues, such as the inability to query a package for what symbols it has, or the way the obvious workarounds have their own problems, brought on for instance by the fact that you can't re-export symbols in another package so the whole "just keep compiling a module incrementally" has its own issues. At this point I'm pretty confident that the best reasonably-possible Go REPL (i.e., not one that basically reimplements Go again) is going to come with a list of caveats a mile long. And if you do reimplement Go for the REPL, which a SSA interpreter basically is, you don't really have a Go REPL... you have a Go-like REPL at best.


I was challenging your "Go is not set up to have a REPL" assertion. we are in violent agreement about the (non) "REPL-ness" of _gore_ and the fact that "just keep compiling a module incrementally" is a dead end.

I don't see how reimplementing Go in Go but with a REPL (on top of, let's say, a SSA interpreter) is a Go-like REPL at best and not just a Go REPL. Could you expand a bit?

AFAICT, once one has a Go interpreter (the x/go/ssa is basically just that), implementing a REPL is "just" providing a program counter to the interactive interpreter layer, together with the ability to modify that PC... (many details have been dropped on the floor, of course)



The problem I see with it is not so much that you don't want a native, compiled implementation of a machine learning as that you want to be able to call it from Python, MatLAB and R. It is somewhat less straightforward to go Python<->Go than Python<->C or even Python<->Rust.


https://github.com/go-python/gopy

it automatically creates a CPython-2 extension module out of a Go-1.5 package.

I plan to update it for Go>=1.6 and also directly generate a "cffi" python module so CPython-{2,3} and PyPy can be directly supported out of the box.


Woah.

Sorry, I have nothing of substance to say, but this deserves more than an upvote.


FWIW: there is a virtual identical golang equiv to Jupyter in the golang tour (which is all open source).


FWIW: You can use Go in Jupyter https://github.com/gopherds/gophernotes


you can even use it on the web: https://github.com/gopherds/mybinder-go


The tensorflow team is working on a go wrapper as well:

https://github.com/tensorflow/tensorflow/issues/10#issuecomm...


Hi,

I wrote Gorgonia: https://github.com/chewxy/gorgonia

If there are any questions I'm available to answer them


How does it handle recurrent connections. I really like Microsoft's approach with CNTK - they use a DSL called 'BrainScript' to define the network, and it's basically a computational network (hence then name), but there are special operations like

    x = PastValue(y)
and

    y = FutureValue(x)
for forward and backward recurrent connections. It even automatically only unrolls the recurrent part of the network.

In contrast all of the other NN systems I've used have premade layers like LSTM and GRU, and to be honest their methods are so hacky I haven't really been able to work out how to do custom RNNs in any other system.

How does Gorgonia handle it?


The trick, IMO, is to manually unroll them.

So what I do is something like this:

    func (m *model) costFn() (cost *Node, n int, err error) {
        var prev *recurrentOutput
        for ...{
            // build your recurrent graph here
        }
    }
    
    func (m *model) runOne() (err error) {
        cost, n err := costFn()
        g := m.g.SubgraphRoots(cost)
        machine := NewLispMachine(g)
        if err = machine.RunAll(); err != nil{
            return 
        }
    }
The LispMachine type allows for rapid prototyping. Then once that's all done and stuff, you would probably have figured out what the size of your graph is and you can then create a TapeMachine that have the sizes you defined (though I have removed all the bits of the TapeMachine that made it turing complete, so it may be a bit hard to do that).


Manual unrolling fails as soon as the number of repetitions is not fixed (e.g. NLP tasks where it matches number of tokens in a sentence); so you want the looping to happen after the graph has been made, preferably on the GPU.


For Gorgonia it's the other way around. You build the graph by manually unrolling it. It'll be a big goddamn graph, which is why you subgraph it and run it.

source: actually have various LSTMs running. Some with attention, some with ADHD


Is performance acceptable if you're required to re-build/adjust the graph for every single minibatch, because the number of unrolled items is different every time?

That is what I mean by "Manual unrolling fails as soon as the number of repetitions is not fixed"; you can do manual unrolling in every framework but generally repeated graph building makes it unusably slow if you're unable to do the loop "in the system" with a runtime-chosen number of repetitions.

Padding is also not really a solution, since your average sequence length is likely 5 or 10 times less than the maximum sequence length that you want to support, so just padding to a fixed size will mean 5-10 times slower processing.


I just checked out BrainScript. Looks pretty cool. Hacking up a parser for that into Gorgonia should be fairly trivial. Could be a fun weekend hacking project.


This is awesome! Great to see so much momentum on the Go data science front (see https://github.com/gopherds/resources/blob/master/tooling/RE... for more). Go provides a great integrity and deployment story for data scientists, and contributions like this are well worth the effort!


Yup, amazing work. I hope it gets traction. In fact, there's a lot happening in the "Machine Learning" front of Go https://go.libhunt.com/categories/507-machine-learning


I think Microsoft should take some initiatives to help bring more of these projects into the .NET ecosystem. C# is a fantastic language, and is generally missing out on a lot of this ML stuff.

(Although maybe it conflicts with their desire to profit from it?)


Microsoft has CNTK - https://github.com/Microsoft/CNTK

Which surprisingly is one of the few I didn't refer to when implementing Gorgonia


There are no conflicts. The latest strategy across the board is to give away everything for free and then run it on the cloud. In that sense microsoft has a lot to gain actually by offering a solution in .net and then making it dead simple to deploy to azure.

Software and hardware are now a commodity and all the big players know it. So the only way to make money is to add value some other way like making it dead simple to deploy tensor flow graphs in gce.


I agree with you on Microsofts/Googles/Amazon's cloud business strategy. In spite of the apparent good profits Amazon is making, I question how profitable this business will be long term. I expect large gross sales and small profits because of intense competition.


Besides the CNTK referred on a sibling comment, I imagine Alea GPU (http://quantalea.com/) could be used for such purposes.


Number one on my wish list for Go 2 is the ability to go from A to B:

A =

if xw, err = Mul(x,w); err != nil { log.Fatal(err) }

if xwpb, err = Add(xw, b); err != nil { log.Fatal(err) }

if prob, err = Sigmoid(xwpb); err != nil { log.Fatal(err) }

B =

probs := Sigmoid(Add(Mul(x,w), b))


Go 1 is already there:

probs := MustSigmoid(MustAdd(MustMul(x,w), b))


That panics. What I want is monadic error chaining.


    Fixed up GPU work
I haven't thought about using golang to feed data to a GTX 1080 or similar, but before i do that, i'd like some info on the above, and a quick google shows that there may not be a complete wrapper for the Cuda Driver or Runtime API's like pyCuda.


Hi,

So I wrote my own CUDA wrapper. Turns out there was a bug. I'm not sure where and how, but it worked for the earlier versions of Gorgonia but not this release.

You may also note that a bunch of assembly for the float32 tensor are missing... they seem to be affected by the same bug, so what I'm doing now is I'm planning to rewrite from scratch


Did you reach out to Nvidia? It seems like they should be willing to help with putting out tested/documented bindings for more languagues.

Also, where in the repo is your wrapper? (I don't have enough knowledge to debug s.t. like that, just curious how they work)


It's not released - simply because it failed to even build properly.

No I have not reached out to nvidia. Don't have any contacts there


You could possibly try https://twitter.com/AlisonBLowndes


Why not OpenCL? With AVX2, it's quite a performer even on Xeons.


It's still lacking as highly optimized libraries like cuDNN last time I looked :/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: