I'm sorry, slight rant, but I do not think notebooks are good for a completed piece of software. Maybe it's good for development/brainstorming but once you are done the code needs to be extracted into a more traditional format.
In research I have seen countless mathematica/python notebooks which have been released into the wild, and they are all over the place. Maybe software developers will structure it better, but I'm kind of sick of seeing an incoherent spaghetti state. Especially since there is no strict ordering of the evaluation of cells.
Maybe it would be nice if notebooks could break from the traditional "page of working" format. It could display as some tree/graph which orders the state/dependency in evaluation. Otherwise it needs an "export" button.
What you complain about is more about the author of the notebook being disorganized rather than the format. The format also allows that to happen, but it's a lack of discipline on the author that creates that mess.
Many people don't care about reproducibility, or having a paper that can be simply read and understood. They just care about getting their paper out. Part to blame is also on the reviewers who don't really evaluate this as part of the paper (assuming they're given access and the opportunity to evaluate the code, that is!).
I recently reviewed a paper for a conference and it was a mess in every aspect. I wish they had included the .ipynb file. Instead, they provided mangled (both by cropping and jpeg compression) screenshots of the notebook. The code was an utter mess (instead of doing X_prime = X[10:100], they did X_prime = [X[10], X[11], X[12], ..., X[98], X[99]]).
I gave that mess the strong reject it deserved (the rest of the paper was on par with the code, if not worse). I'm still waiting to see if the other reviewes will do the right thing, or they'll jump on the fad that """deep learning""" (they used 90's machine learning methods at best) has become.
Been using this for work projects. A lot of raised eyebrows when people hear jupyter first development, but the automated docs, flexibility with prose, inline testing, out of the box pip packaging, and git integration make it well worth it. A bit of a learning curve, but very rewarding.
I've written a lot of jupyter notebooks and honestly, emacs+org-mode is way better. Maybe if jupyter wasn't in a webbrowser, had vim or emacs keybindings, it would be better. Also, I'm not sure the idea of notebooks is even a good idea: it's very easy to get into a inconsistent state and with no textual source of truth it can be very difficult getting back.
Though, thinking about it, the real problem is that when I'm using a real editor (emacs), I feel like a wizard, I know it like the back of my hand and have any number of extensions and libraries I can use. With jupyter, I'm always fighting something and there's no meaningful way to configure it to do what you want. Also, the intellisense sucks. In addition, and maybe this is silly, but I find using a webbrowser to write code to be distasteful.
I'm feeling the same way. I really like the idea of jupyter, but it should be a native application, not something running in a browser. Maybe it's also me not being used to working with notebooks, but I find it strange that I manually have to reevaluate all following cells when I change an earlier one. Shouldn't that just happen automatically?
I dabbled a bit with EIN[1], an emacs client for Jupyter, but it didn't work all that well for me. In particular it didn't work well with my dark color scheme and you still needed to run a jupyter server to connect to.
> I find it strange that I manually have to reevaluate all following cells when I change an earlier one. Shouldn't that just happen automatically?
I get where you are coming from but this would ruin a lot of my data analysis stuff. These are cases where I have 30 minute queries in the lower cells. I don't want those to fire every time.
What might be a nice addition is the ability to either 1) clear the output of all those cells, or 2) mark those cells as inconsistent.
That being said, there are enough other foot-guns besides out-of-order execution in jupyter notebooks. The number of times that persisted variables defined in long-deleted cells have masked bugs is more than I'd care to admit.
>The number of times that persisted variables defined in long-deleted cells have masked bugs is more than I'd care to admit.
Definitely. It happens particularly often when you change a variable name (and modify its definition) and forget to update the name for parameters further down in the notebook. Suddenly, without any warning, you are using stale data for your analysis, which can really throw a wrench in things. It would nice if when you changed a cell it erased all the old definitions in that cell.
And I don't think I'm the only one who doesn't have trust in their notebook definitions: I've noticed a trend among pretty much anyone who uses them that after they finish their analysis, they restart the kernel and rerun the entire notebook from start to finish as they have little faith that the results in the notebook are actually derived from the cells currently in the notebook.
This is something that I struggled with as well. Back before Jupyter was a big thing I wrote a system called bein (https://github.com/madhadron/bein) that promoted the individual execution to the primary artifact instead of the source code, since that's usually what data analysis and other computational science work really cares about.
Based on my intervening (looks at git timestamps) decade of thinking, I would probably approach it differently, but I think that key point of execution as the artifact and wanting to trace its provenance instead of wanting to track source code remains correct.
Also been using nbdev for work projects in past month. So far, it's been a great productivity boost.
I really, really like the specific aspect of testing with nbdev.
Your docs/examples are your tests. The notebook is a natural environment for scaffolding mocks and other things, without having to use a testing framework over top of the unittest objects.
I'm a long-time user of IDEs, and always will be, but if your metric for producing code is just lines of code, nbdev isn't for you. However, if your metric is producing documented, tested code that is maintainable, it's definitely, for me and my team, a big productivity boost.
nbdev works with any kernel, because it only cares about the notebook files, which as you know are just JSON.
I've been using it with xeus-python kernel lately.
Edit: Forgot the Git part. It was designed to handle the git limitations in notebooks as well. In fact, one of the commands in the tutorials is to set up it's built-in git hooks, which will run special clean commands on commits.
In research I have seen countless mathematica/python notebooks which have been released into the wild, and they are all over the place. Maybe software developers will structure it better, but I'm kind of sick of seeing an incoherent spaghetti state. Especially since there is no strict ordering of the evaluation of cells.
Maybe it would be nice if notebooks could break from the traditional "page of working" format. It could display as some tree/graph which orders the state/dependency in evaluation. Otherwise it needs an "export" button.