Gorgonia: a library like Theano or TensorFlow, mainly written in Go

thinxer · on Sept 19, 2016

I don't see the point of writing a machine learning package in Go.

Go is designed for system programming, which is the opposite of scripting. In scientific computing, interactivity is an important feature. Softwares like MATLAB and IPython(now Jupyter) become popular because of this. Go doesn't have a nice REPL like IPython, and it never will. Besides, Go doesn't allow operator overload, which leads to verbose programs. Compare `c = a @ b` with `c := Must(Mul(a, b))`, the former Python statement is much cleaner than the Go one.

Usually, researchers use MATLAB, R, and Python for prototyping and training. The final model will be put into production by using something like TensorFlow Serving or by hand written C++ for performance. Even if the final product uses Go for serving, the Go program mostly runs the model via cgo or RPC.

Write in C/C++, and wrap the library in Python and Go is a much saner option. TensorFlow and MXNet are both good examples.

marmaduke · on Sept 19, 2016

I've been

> scientific computing

for quite some time, and I've the value of a REPL steadily in decline: what's far more useful is a tested code snippet or function & a debugger, occasionally.

Also, if a scientific stack were available in Go, I'd jump on it, because deploying Go is so much simpler than Python. Python's deployment story for scientific software often starts with either )a) create the perfect storm of dependency versions or (b) use Anaconda. Even MATLAB is way better in this regard.

vegabook · on Sept 19, 2016

In fairness, "use Anaconda" usually nullifies this dependency / deployment problem completely.

NegatioN · on Sept 19, 2016

The "perfect storm of dependency versions" is a problem in Go as well though?

eikenberry · on Sept 19, 2016

Not in practice. The vendoring solution that they are standardizing on makes versioning issues very manageable.

NegatioN · on Sept 19, 2016

Which solution is this? Do you have any links or more information about that spesifically?

MoOmer · on Sept 20, 2016

https://github.com/golang/go/wiki/PackageManagementTools

sbinet · on Sept 20, 2016

out of interest, what do you find lacking in https://github.com/gonum offering as a scientific stack?

chewxy · on Sept 19, 2016

I agree with you in general. However, I contend that people are lazy. Part of the reason why I wrote Gorgonia was because I spent waay too long trying to deploy theano on the cloud (this was circa 2 years ago).

Also, thanks for informing me about MXNet. I've never seen that before. Kinda cool.

pjmlp · on Sept 19, 2016

> I don't see the point of writing a machine learning package in Go.

If nothing else, to prove a point that there are solid alternatives to C and C++, in memory safe languages.

solidsnack9000 · on Sept 20, 2016

If the point is feasibility, Java has shown us that for a long time...unless we consider the null pointer evidence of something not memory safe.

pjmlp · on Sept 20, 2016

Not only feasibility, but having a GC enabled programming language with AOT compilation to native code (until Java 10, you will need to buy such compilers as the FOSS aren't as good).

After all the OP mentioned that only C and C++ were valid alternatives, hence why it is so important to implement such tools in other safer languages.

If people don't do it, as usual there will be a myth that only those languages are able to produce such type of applications.

Similar to the myth that C was the first systems programming language, when many of us were using something else before UNIX got widespread in the industry.

solidsnack9000 · on Sept 20, 2016

This is starting to seem fairly specific -- machine learning with memory safe languages that have AOT compilation to native code. It's not even, memory-safe languages that are performant -- because Java would totally match that requirement.

What I'm suggesting here is that there is nothing really remarkable about it, from either a research or implementation perspective, given the achievements of the Java community in recent years.

howeman · on Sept 19, 2016

Interactivity can be an important feature for scientific programming, but so is having correct underlying algorithms that are implemented in a clean, composable way. Go is a good language for such implementations.

> Go doesn't have a nice REPL like IPython, and it never will.

https://github.com/gopherds/gophernotes

zik · on Sept 19, 2016

> Go doesn't have a nice REPL like IPython, and it never will.

https://github.com/motemen/gore

There are a few others too.

IshKebab · on Sept 19, 2016

It's not a real repl. It just records everything you've typed, puts it in a file, and compiles and runs it. It does that every time you add a new line. It's a huge hack and doesn't behave like you'd expect at all.

d0vs · on Sept 19, 2016

https://godoc.org/golang.org/x/tools/go/ssa/interp

jerf · on Sept 19, 2016

Yes, but that's not exactly a REPL, is it? It's just an incomplete tool that may someday be used to create a REPL.

Go is not set up to have a REPL; even if someone develops one someday it's always going to be limited compared to a language that was designed to do it. REPLs are actually pretty hard; even some languages designed with them from day one (or at least very early) like Haskell still come with an interesting list of caveats about the REPL.

sbinet · on Sept 19, 2016

I beg to differ. See: https://github.com/sbinet/igo

It is a true REPL (not like "gore", mentioned somewhere in that thread). It doesn't implement the whole Go specs, but there's AFAICT nothing in the Go specs that prevent `igo` to implement the whole Go specs. It's "just" work. Moreover, with the introduction of the "buildmode=plugin" in the yet-to-be-released Go-1.8, import-ing packages on the fly should be easy implementable.

("my" new vehicle to provide a true REPL for Go is at https://github.com/go-interpreter)

jerf · on Sept 20, 2016

If you have to link to a different project, then the link I was saying is not a REPL is not a REPL, no?

Further, I'd suggest waiting for one of these projects to complete before you confidently declare that the REPL will be great. I didn't say it's impossible, I said "it's always going to be limited compared to a language that was designed to do it". I looked into what it would take to make a REPL around the Go 1.2 timeframe. There are many issues, such as the inability to query a package for what symbols it has, or the way the obvious workarounds have their own problems, brought on for instance by the fact that you can't re-export symbols in another package so the whole "just keep compiling a module incrementally" has its own issues. At this point I'm pretty confident that the best reasonably-possible Go REPL (i.e., not one that basically reimplements Go again) is going to come with a list of caveats a mile long. And if you do reimplement Go for the REPL, which a SSA interpreter basically is, you don't really have a Go REPL... you have a Go-like REPL at best.

sbinet · on Sept 20, 2016

I was challenging your "Go is not set up to have a REPL" assertion. we are in violent agreement about the (non) "REPL-ness" of _gore_ and the fact that "just keep compiling a module incrementally" is a dead end.

I don't see how reimplementing Go in Go but with a REPL (on top of, let's say, a SSA interpreter) is a Go-like REPL at best and not just a Go REPL. Could you expand a bit?

AFAICT, once one has a Go interpreter (the x/go/ssa is basically just that), implementing a REPL is "just" providing a program counter to the interactive interpreter layer, together with the ability to modify that PC... (many details have been dropped on the floor, of course)

weberc2 · on Sept 19, 2016

Bold statement. https://github.com/go-interpreter/proposal/issues/1

solidsnack9000 · on Sept 19, 2016

The problem I see with it is not so much that you don't want a native, compiled implementation of a machine learning as that you want to be able to call it from Python, MatLAB and R. It is somewhat less straightforward to go Python<->Go than Python<->C or even Python<->Rust.

sbinet · on Sept 19, 2016

https://github.com/go-python/gopy

it automatically creates a CPython-2 extension module out of a Go-1.5 package.

I plan to update it for Go>=1.6 and also directly generate a "cffi" python module so CPython-{2,3} and PyPy can be directly supported out of the box.

scrollaway · on Sept 19, 2016

Woah.

Sorry, I have nothing of substance to say, but this deserves more than an upvote.

SEJeff · on Sept 19, 2016

FWIW: there is a virtual identical golang equiv to Jupyter in the golang tour (which is all open source).

weberc2 · on Sept 19, 2016

FWIW: You can use Go in Jupyter https://github.com/gopherds/gophernotes

sbinet · on Sept 19, 2016

you can even use it on the web: https://github.com/gopherds/mybinder-go

tristanz · on Sept 19, 2016

The tensorflow team is working on a go wrapper as well:

https://github.com/tensorflow/tensorflow/issues/10#issuecomm...

chewxy · on Sept 19, 2016

Hi,

I wrote Gorgonia: https://github.com/chewxy/gorgonia

If there are any questions I'm available to answer them

IshKebab · on Sept 19, 2016

How does it handle recurrent connections. I really like Microsoft's approach with CNTK - they use a DSL called 'BrainScript' to define the network, and it's basically a computational network (hence then name), but there are special operations like

    x = PastValue(y)

and

    y = FutureValue(x)

for forward and backward recurrent connections. It even automatically only unrolls the recurrent part of the network.

In contrast all of the other NN systems I've used have premade layers like LSTM and GRU, and to be honest their methods are so hacky I haven't really been able to work out how to do custom RNNs in any other system.

How does Gorgonia handle it?

chewxy · on Sept 19, 2016

The trick, IMO, is to manually unroll them.

So what I do is something like this:

    func (m *model) costFn() (cost *Node, n int, err error) {
        var prev *recurrentOutput
        for ...{
            // build your recurrent graph here
        }
    }
    
    func (m *model) runOne() (err error) {
        cost, n err := costFn()
        g := m.g.SubgraphRoots(cost)
        machine := NewLispMachine(g)
        if err = machine.RunAll(); err != nil{
            return 
        }
    }

The LispMachine type allows for rapid prototyping. Then once that's all done and stuff, you would probably have figured out what the size of your graph is and you can then create a TapeMachine that have the sizes you defined (though I have removed all the bits of the TapeMachine that made it turing complete, so it may be a bit hard to do that).

PeterisP · on Sept 19, 2016

Manual unrolling fails as soon as the number of repetitions is not fixed (e.g. NLP tasks where it matches number of tokens in a sentence); so you want the looping to happen after the graph has been made, preferably on the GPU.

chewxy · on Sept 19, 2016

For Gorgonia it's the other way around. You build the graph by manually unrolling it. It'll be a big goddamn graph, which is why you subgraph it and run it.

source: actually have various LSTMs running. Some with attention, some with ADHD

PeterisP · on Sept 19, 2016

Is performance acceptable if you're required to re-build/adjust the graph for every single minibatch, because the number of unrolled items is different every time?

That is what I mean by "Manual unrolling fails as soon as the number of repetitions is not fixed"; you can do manual unrolling in every framework but generally repeated graph building makes it unusably slow if you're unable to do the loop "in the system" with a runtime-chosen number of repetitions.

Padding is also not really a solution, since your average sequence length is likely 5 or 10 times less than the maximum sequence length that you want to support, so just padding to a fixed size will mean 5-10 times slower processing.

chewxy · on Sept 19, 2016

I just checked out BrainScript. Looks pretty cool. Hacking up a parser for that into Gorgonia should be fairly trivial. Could be a fun weekend hacking project.

dwhitena · on Sept 19, 2016

This is awesome! Great to see so much momentum on the Go data science front (see https://github.com/gopherds/resources/blob/master/tooling/RE... for more). Go provides a great integrity and deployment story for data scientists, and contributions like this are well worth the effort!

stanislavb · on Sept 19, 2016

Yup, amazing work. I hope it gets traction. In fact, there's a lot happening in the "Machine Learning" front of Go https://go.libhunt.com/categories/507-machine-learning

thr0waway1239 · on Sept 19, 2016

I think Microsoft should take some initiatives to help bring more of these projects into the .NET ecosystem. C# is a fantastic language, and is generally missing out on a lot of this ML stuff.

(Although maybe it conflicts with their desire to profit from it?)

chewxy · on Sept 19, 2016

Microsoft has CNTK - https://github.com/Microsoft/CNTK

Which surprisingly is one of the few I didn't refer to when implementing Gorgonia

_qc3o · on Sept 19, 2016

There are no conflicts. The latest strategy across the board is to give away everything for free and then run it on the cloud. In that sense microsoft has a lot to gain actually by offering a solution in .net and then making it dead simple to deploy to azure.

Software and hardware are now a commodity and all the big players know it. So the only way to make money is to add value some other way like making it dead simple to deploy tensor flow graphs in gce.

mark_l_watson · on Sept 19, 2016

I agree with you on Microsofts/Googles/Amazon's cloud business strategy. In spite of the apparent good profits Amazon is making, I question how profitable this business will be long term. I expect large gross sales and small profits because of intense competition.

pjmlp · on Sept 19, 2016

Besides the CNTK referred on a sibling comment, I imagine Alea GPU (http://quantalea.com/) could be used for such purposes.

perfmode · on Sept 19, 2016

Number one on my wish list for Go 2 is the ability to go from A to B:

A =

if xw, err = Mul(x,w); err != nil { log.Fatal(err) }

if xwpb, err = Add(xw, b); err != nil { log.Fatal(err) }

if prob, err = Sigmoid(xwpb); err != nil { log.Fatal(err) }

B =

probs := Sigmoid(Add(Mul(x,w), b))

Zikes · on Sept 19, 2016

Go 1 is already there:

probs := MustSigmoid(MustAdd(MustMul(x,w), b))

perfmode · on Sept 19, 2016

That panics. What I want is monadic error chaining.

gtani · on Sept 19, 2016

    Fixed up GPU work

I haven't thought about using golang to feed data to a GTX 1080 or similar, but before i do that, i'd like some info on the above, and a quick google shows that there may not be a complete wrapper for the Cuda Driver or Runtime API's like pyCuda.

chewxy · on Sept 19, 2016

Hi,

So I wrote my own CUDA wrapper. Turns out there was a bug. I'm not sure where and how, but it worked for the earlier versions of Gorgonia but not this release.

You may also note that a bunch of assembly for the float32 tensor are missing... they seem to be affected by the same bug, so what I'm doing now is I'm planning to rewrite from scratch

gtani · on Sept 19, 2016

Did you reach out to Nvidia? It seems like they should be willing to help with putting out tested/documented bindings for more languagues.

Also, where in the repo is your wrapper? (I don't have enough knowledge to debug s.t. like that, just curious how they work)

chewxy · on Sept 19, 2016

It's not released - simply because it failed to even build properly.

No I have not reached out to nvidia. Don't have any contacts there

imurray · on Sept 19, 2016

You could possibly try https://twitter.com/AlisonBLowndes

marmaduke · on Sept 19, 2016

Why not OpenCL? With AVX2, it's quite a performer even on Xeons.

hobofan · on Sept 19, 2016

It's still lacking as highly optimized libraries like cuDNN last time I looked :/