Nbdev: Use Notebooks for Literate Programming

foxes · on April 14, 2020

I'm sorry, slight rant, but I do not think notebooks are good for a completed piece of software. Maybe it's good for development/brainstorming but once you are done the code needs to be extracted into a more traditional format.

In research I have seen countless mathematica/python notebooks which have been released into the wild, and they are all over the place. Maybe software developers will structure it better, but I'm kind of sick of seeing an incoherent spaghetti state. Especially since there is no strict ordering of the evaluation of cells.

Maybe it would be nice if notebooks could break from the traditional "page of working" format. It could display as some tree/graph which orders the state/dependency in evaluation. Otherwise it needs an "export" button.

dr_zoidberg · on April 14, 2020

What you complain about is more about the author of the notebook being disorganized rather than the format. The format also allows that to happen, but it's a lack of discipline on the author that creates that mess.

Many people don't care about reproducibility, or having a paper that can be simply read and understood. They just care about getting their paper out. Part to blame is also on the reviewers who don't really evaluate this as part of the paper (assuming they're given access and the opportunity to evaluate the code, that is!).

I recently reviewed a paper for a conference and it was a mess in every aspect. I wish they had included the .ipynb file. Instead, they provided mangled (both by cropping and jpeg compression) screenshots of the notebook. The code was an utter mess (instead of doing X_prime = X[10:100], they did X_prime = [X[10], X[11], X[12], ..., X[98], X[99]]).

I gave that mess the strong reject it deserved (the rest of the paper was on par with the code, if not worse). I'm still waiting to see if the other reviewes will do the right thing, or they'll jump on the fad that """deep learning""" (they used 90's machine learning methods at best) has become.

dr_zoidberg · on April 14, 2020

Since I can't edit, I'll reply to myself:

The other reviewers are on the deep learning fad train. The paper has been accepted, despite being a big pile of nonsense.

conradludgate · on April 14, 2020

nbdev has exactly that export functionality. https://nbdev.fast.ai/export/

erikgaas · on April 14, 2020

Been using this for work projects. A lot of raised eyebrows when people hear jupyter first development, but the automated docs, flexibility with prose, inline testing, out of the box pip packaging, and git integration make it well worth it. A bit of a learning curve, but very rewarding.

smabie · on April 14, 2020

I've written a lot of jupyter notebooks and honestly, emacs+org-mode is way better. Maybe if jupyter wasn't in a webbrowser, had vim or emacs keybindings, it would be better. Also, I'm not sure the idea of notebooks is even a good idea: it's very easy to get into a inconsistent state and with no textual source of truth it can be very difficult getting back.

Though, thinking about it, the real problem is that when I'm using a real editor (emacs), I feel like a wizard, I know it like the back of my hand and have any number of extensions and libraries I can use. With jupyter, I'm always fighting something and there's no meaningful way to configure it to do what you want. Also, the intellisense sucks. In addition, and maybe this is silly, but I find using a webbrowser to write code to be distasteful.

ginko · on April 14, 2020

I'm feeling the same way. I really like the idea of jupyter, but it should be a native application, not something running in a browser. Maybe it's also me not being used to working with notebooks, but I find it strange that I manually have to reevaluate all following cells when I change an earlier one. Shouldn't that just happen automatically?

I dabbled a bit with EIN[1], an emacs client for Jupyter, but it didn't work all that well for me. In particular it didn't work well with my dark color scheme and you still needed to run a jupyter server to connect to.

[1] http://millejoh.github.io/emacs-ipython-notebook/

d0mine · on April 14, 2020

There is https://github.com/dzop/emacs-jupyter

    #+BEGIN_SRC jupyter-python :session py :display plain
    import pandas as pd
    data = [[1, 2], [3, 4]]
    pd.DataFrame(data, columns=["Foo", "Bar"])
    #+END_SRC
    
    #+RESULTS:
    :    Foo  Bar
    : 0    1    2
    : 1    3    4

rocqua · on April 14, 2020

> I find it strange that I manually have to reevaluate all following cells when I change an earlier one. Shouldn't that just happen automatically?

I get where you are coming from but this would ruin a lot of my data analysis stuff. These are cases where I have 30 minute queries in the lower cells. I don't want those to fire every time.

What might be a nice addition is the ability to either 1) clear the output of all those cells, or 2) mark those cells as inconsistent.

That being said, there are enough other foot-guns besides out-of-order execution in jupyter notebooks. The number of times that persisted variables defined in long-deleted cells have masked bugs is more than I'd care to admit.

smabie · on April 14, 2020

>The number of times that persisted variables defined in long-deleted cells have masked bugs is more than I'd care to admit.

Definitely. It happens particularly often when you change a variable name (and modify its definition) and forget to update the name for parameters further down in the notebook. Suddenly, without any warning, you are using stale data for your analysis, which can really throw a wrench in things. It would nice if when you changed a cell it erased all the old definitions in that cell.

And I don't think I'm the only one who doesn't have trust in their notebook definitions: I've noticed a trend among pretty much anyone who uses them that after they finish their analysis, they restart the kernel and rerun the entire notebook from start to finish as they have little faith that the results in the notebook are actually derived from the cells currently in the notebook.

madhadron · on April 14, 2020

This is something that I struggled with as well. Back before Jupyter was a big thing I wrote a system called bein (https://github.com/madhadron/bein) that promoted the individual execution to the primary artifact instead of the source code, since that's usually what data analysis and other computational science work really cares about.

Based on my intervening (looks at git timestamps) decade of thinking, I would probably approach it differently, but I think that key point of execution as the artifact and wanting to trace its provenance instead of wanting to track source code remains correct.

JPKab · on April 14, 2020

Also been using nbdev for work projects in past month. So far, it's been a great productivity boost.

I really, really like the specific aspect of testing with nbdev.

Your docs/examples are your tests. The notebook is a natural environment for scaffolding mocks and other things, without having to use a testing framework over top of the unittest objects.

I'm a long-time user of IDEs, and always will be, but if your metric for producing code is just lines of code, nbdev isn't for you. However, if your metric is producing documented, tested code that is maintainable, it's definitely, for me and my team, a big productivity boost.

taeric · on April 14, 2020

Didn't pydoc already have inline testing?

CGamesPlay · on April 14, 2020

Sure, but git already had the ability to commit files to it and pip already had the ability to create packages.

spv · on April 14, 2020

Can nbdev be used with other Jupyter kernels, for eg like Julia.

My main issue with notebooks is putting them under version control in git. This seems like it could help with that.

amirathi · on April 14, 2020

Yes, nbdev can help with merge conflicts for notebooks [1]. Also checkout,

- GitPlus[2] - JupyterLab extension for git version control

- ReviewNB[3] - For notebooks diffs & commenting

Disclaimer: I built GitPlus & ReviewNB

[1] https://nbdev.fast.ai/#Avoiding-and-handling-git-conflicts

[2] https://github.com/ReviewNB/jupyterlab-gitplus/

[3] https://www.reviewnb.com/

spv · on April 14, 2020

reviewnb looks interesting. cool project.

nerdponx · on April 14, 2020

I use NBDime to make readable diffs of Jupyter notebooks https://nbdime.readthedocs.io/en/latest/

spv · on April 14, 2020

this is exactly what I was looking for. thanks.

mafm · on April 14, 2020

Jupytext (https://github.com/mwouts/jupytext) solves most of the problems with managing notebooks in git.

JPKab · on April 14, 2020

nbdev works with any kernel, because it only cares about the notebook files, which as you know are just JSON.

I've been using it with xeus-python kernel lately.

Edit: Forgot the Git part. It was designed to handle the git limitations in notebooks as well. In fact, one of the commands in the tutorials is to set up it's built-in git hooks, which will run special clean commands on commits.

bpesquet · on April 14, 2020

Are there some example projects available somewhere? I cannot find any (apart from nbdev itself).