Pyro: A universal, probablistic programming language

abeppu · on July 28, 2023

> Pyro enables flexible and expressive deep probabilistic modeling, unifying the best of modern deep learning and Bayesian modeling.

Does anyone who works in this area have a sense of why PPLs haven't "taken off" really? Like, of the last several years of ML surprising successes, I can't really think of any major ones that come from this line of work. To the extent that Bayesian perspectives contribute to deep learning, I more often see e.g. some particular take on ensembling around the same models trained to find a point estimate via SGD, rather than models built up from random variables about which we update beliefs including representation of uncertainty.

smeeth · on July 28, 2023

Some might disagree with me but my best guesses are:

- Probability math is confusing and difficult, and a base understanding is required to use PPLs in a way that is not true of other ML/DL. Most CS PhDs will not be required to take enough of it to find PPLs intuitive, so to be familiar they will have had to opt into those classes. This is to say nothing of BS/MS practitioners, so the user base is naturally limited to the subset of people who studied Math/Stats is a rigorous way AND opted into the right classes or taught themselves later.

- Probabilistic models are often unique to the application. They require lots of bespoke code, modeling, and understanding. Contrast this with DL, where you throw your data in a blender and receive outputs.

- Uncertainty quantification often is not the most important outcome for sexy ML use cases. That is more frequently things like "accuracy," "residual error," or "wow that picture looks really good".

- PPL package tooling and documentation are often very confusing and don't work similarly to one another. This isn't necessarily the developer's fault, this stuff is hard, and the people with the domain knowledge needed to actually understand this stuff often have spent fewer hours in the open-source trenches.

abeppu · on July 28, 2023

Re your comment on CS PhDs not having probability background -- do you find that's true of ML researchers? I would understand that in a bunch of CS specialties, probability may not be a requirement, but in ML I would have expected otherwise.

gh02t · on July 28, 2023

Not OP but I deal with this a lot. In my experience a lot of folks working in mainstream ML haven't been exposed to it unless they specifically focused on it. It might just be a course load thing... getting the most out of these probabilistic PLs requires fairly deep expertise in both probability theory/Bayesian stats as well as in CS and you have a finite amount of courses you can take in school. Plus, a lot of the work in this area pre-dates the modern focus on deep learning or machine learning in general, so a lot of the knowledge tends to be held by professors/researchers that may not be as involved with the "new" ML courses. And of course, Math/Stats/CS departments don't always play nicely with each other and like to fight turf wars, though I've noticed cross-disciplinary research among the three becoming more accepted at the universities/institutes I work with.

As a case study, I did most of my grad work on solving Bayesian inverse problems using probabilistic programming for applications in engineering, which is pretty cross-disciplinary. I now work mostly in ML, but I didn't really even touch anything in the ML domain until after I finished school. I could have, the courses were available, but they just weren't relevant to me at the time.

Edit: I wouldn't be surprised if there was a considerable userbase in industries like finance, but in my experience those folks don't share much.

seanmcdirmid · on July 28, 2023

One of the best ML researcher I know has a background in signal processing (and degrees in EE to go with). Not probability per se, but heavily uses probability and statistics.

antegamisou · on July 29, 2023

> has a background in signal processing

Which largely counts as strong Linear Algebra and Probability Theory background.

junipertea · on July 28, 2023

While there are some exception, majority of published deep learning research barely mentions statistics at all, it's optimization all the way down

uoaei · on July 28, 2023

ML is fundamentally not a CS specialty. It is a statistics/optimization (thus applied math) specialty.

CS only comes into the picture at runtime. ML theory is divorced from computability until then.

ke88y · on July 28, 2023

That's a weird game to play with those words.

uoaei · on July 28, 2023

Can you elaborate? The unreasonable effectiveness of approximate methods on discretized spaces doesn't change the fact that the theory underlying it is exact and continuous.

ke88y · on July 29, 2023

ML is a sub field of CS, in practice.

You don’t need a professional license to do math. Lots of computer scientists to harder and more interesting mathematics than their peers in the math dept. In that respect at least, the main substantive difference between the fields is about $40k/yr.

uoaei · on July 29, 2023

> ML is a sub field of CS, in practice.

You're just stating things without justifying them.

What else would you consider a subfield of CS? Finance? Accounting? Logistics? UI design?

What is or isn't a subfield of a given science has nothing to do with the professional qualifications of those who practice it or how the tools may be implemented. We don't call pharmaceuticals "a subfield of robotics" because of how the factories are built.

ke88y · on Aug 2, 2023

Again, this is such a weird game to play with words. I'm not sure what else to say.

latenightcoding · on July 29, 2023

ML people nowadays barely know basic stats.

ur-whale · on July 29, 2023

> ML people nowadays barely know basic stats.

The same can unfortunately be said of many "statisticians", who use statistics as a big recipe book without understanding the first thing about the mathematical underpinnings of the topic.

Don't believe me?

Go ask the first statistician you run into to give you a half decent explanation of how the Chi-squared distribution and the Chi-squared test works, see what happens.

Given_47 · on July 29, 2023

Disappointed black guy meme upon realizing a lot of “data science” is just calling some scikit learn module lol

sloozbug · on July 29, 2023

lmao please link disappointed black guy meme appropriate for the times haha

Given_47 · on July 30, 2023

My bad: https://knowyourmeme.com/memes/disappointed-black-guy

esafak · on July 29, 2023

ML is going mainstream. Most programmers don't know algorithms and data structures, I bet, if you consider all programmers around the world.

uoaei · on July 29, 2023

Normalcy is not a substitute for correctness.

palmy · on July 28, 2023

I work on one of these PPLs, and I personally find Bayesian inference to be useful in a few cases:

1. When your main objective is not prediction but understanding the effect of some underlying / unobserved random variable.

2. When you don't have tons data + you have very clear ideas of the data generation process.

(1) is mainly relevant for science rather than private companies, e.g. if you're an epidemiologist, you're generally speaking interested in determining the effect of certain underlying factors, e.g. effect of mobility patterns, rather than just predicting the number of infected people tomorrow since the hidden variables are often someting you can directly control, e.g. impose travel restrictions.

(2) can occur either in academic settings or in private sector in applications such as revenue optimization. In these scenarios, it's also very useful to have a notion of the "risk" you're taking by optimizing according to this model. Such a notion of risk is in the Bayesian framework completely straight-forward, while less so in the frequentist scenarios.

I've been involved in the above scenarios and have seen clear advantages of using Bayesian inference, both in academia and private sector.

With that being said, I don't think ever Bayesian inference, and thus even less so PPLs, are going to "take off" in a similar fashion to many other machine learning techniques. The reason for this are fairly simple:

1. It's difficult. Applying these techniques efficiently and correctly is way more difficult than standard frequentist methods (even interpeting the results is often non-trivial).

2. The applicability of Bayesian inference (and thus PPLs) is just so much more limited due to the computational complexity + reduction in utility of the methods as data increases (which, for private companies, is more and more the case).

PPLs mainly try to address (1), and we do have examples of very successful examples of this, e.g. PyMC3 (they also have a bunch of nice examples of applying Bayesian inference in private sector context), and Stan (maybe more heavily used in academia).

jvans · on July 29, 2023

> It's difficult. Applying these techniques efficiently and correctly is way more difficult than standard frequentist methods

Do you have any good resources/examples for applying these methods effectively? I've read Statistical Rethinking which is a good introduction to these methods at a high level but I find when I dig into an actual problem I have a lot of gaps and wish there were more real world code examples I could learn from.

nerdponx · on July 29, 2023

I think Bayesian Data Analysis is the natural progression step.

Not sure if there is a more recent book that's updated to use modern Stan examples, but the Stan user guide itself has developed into a very useful resource on its own. It contains a large number of example models and builds up concepts incrementally. The writing style is also generally easy to follow.

jvans · on July 29, 2023

I found that book impenetrable. I'm sure it's the most rigorous textbook on the subject but it is not explained in an intuitive or friendly way.

I will check out the stan guide though, thanks!

t-vi · on July 31, 2023

It knows nothing of the modern stuff (because MacKay died too early), but skipping the first parts of David MacKay: Information Theory, Inference, and Learning Algorithms you get a very accessible course in (200x) Bayesian Inference that should cover most of what you need for diving into PPL applications.

http://www.inference.org.uk/mackay/itila/book.html

nerdponx · on July 29, 2023

In my case, I used it in an actual course on Bayesian inference. Looking back over the material it doesn't seem particularly complicated for anyone with a solid probability background, but maybe the concepts are hard if you aren't seeing them presented nicely in a lecture setting.

mark_l_watson · on July 29, 2023

Thanks, that makes sense: understanding small data problems. I have struggled a few times trying to get into POLs.

nabusman · on July 29, 2023

Is revenue optimization about adjusting prices of products/services to maximize profit/revenue or something else?

6gvONxR4sf7o · on July 28, 2023

They've really taken off in niche places. If you have a complex model of something, it's dramatically easier to use of of these to build/fit your model than it is to code by hand.

But those cases are still things were you might have just a dozen variables (though each might be a long vector). It's more the realm of statistical inference than it is general programming or ML.

It hasn't "taken off" in ML because ML problems generally have more specific solutions based on the problem. If you have something simple and tabular, other solutions are generally better. If you have something recsys shaped, other solutions are generally better. If you have something vision/language shaped, other solutions are generally better.

It hasn't "taken off" in general programming because PPLs generally have trouble with control flow. Cutting off an entire arm of a program is trivial in a traditional language, but in PPLs you'll have to evaluate both. If the arm is a recursion step and hitting the base case is probabilistic, you might even have to evaluate arbitrarily deep (or you approximate that in a way that significantly limits the breadth of techniques available for running a program).

AFAICT, a truism in PPL is that there are always programs that your language will run poorly on but a bespoke engine will do better, by an extreme margin. There just aren't general languages that perform as reliably as in deterministic languages.

It's also just really really hard. It's roughly impossible to make things that are easy in normal languages easy to work with in PPLs. Consider these examples:

`def f(x, y): return x + y + noise` where you condition on `f(3, y) == 5`. It's easy.

`def f(password, salt): return hash(password + salt)` where you condition on `f(password, 8123746) == 1293487`. It's basically not going to happen even though forward evaluation of f is straightforward in any traditional language.

Hell, even just supporting `def f(x, y): return x+y` is hard to generalize. Surprisingly it's harder to generalize than the `x+y+noise` case.

mccoyb · on July 28, 2023

I think you’re overgeneralizing in your control flow discussion.

I also don’t understand your f example with (x, y, noise) if you fix x and the return value, you still have two unknowns with 1 equation. How is that easy to solve?

Unless you’re considering using parametric inverses to represent the solution — but you didn’t mention this so I assume you didn’t mean this.

6gvONxR4sf7o · on July 28, 2023

I was being underspecific to be concise, but in pyro, I'd mean something like this:

    def f(x, y):
      return pyro.sample(
        "z",
        dist.Normal(x+y, 1),
        obs=5
      )

    model = pyro.condition(f, x=3)

singhrac · on July 28, 2023

I have spent a lot of time trying to use PPLs (including Pyro, Edward, numpyro, etc.) in Real World data science use cases, and many times mixing probabilistic programming (which in these contexts means Bayesian inference on graphical models) and deep networks (lots of parameters) doesn't work simply because you don't have very strong priors. There are cases where these are considered very effective (e.g. medicine, econometrics, etc.) but I haven't worked in those areas.

NUTS-based approaches like Stan (and numpyro) have more usage, and I think Prophet is a good example of a generalizable (if limited) tool built on top of PPLs.

Pyro is a very impressive system, as is numpyro, which I think is the successor since Uber AI disbanded (it's much faster).

nextos · on July 28, 2023

It's much more expensive to train models. Besides, compilers are not that smart yet. E.g. a HMM implemented in a PPL is far from the efficiency of hand-rolled code. For many use cases, they are still a leaky abstraction.

However, in areas where measuring uncertainty is important, they have taken off. Stan has become mainstream in Bayesian statistics. Pyro and PyMC are also quite used in industry (I have had recruiters contacting me for this skill). Infer.NET has its own niche on discrete and online inference. Infer.NET models ship with several Microsoft products.

Other interesting PPLs include Turing.jl, Gen.jl, and the venerable BUGS.

rich_sasha · on July 28, 2023

I'd be curious as to what the specific end applications are, are there any you can share?

I'm familiar with the tooling and played with it quite a bit... But never really figured out a practical application.

nextos · on July 29, 2023

Sure. It's hard to make this justice on a single comment, as there are lots of applications. Basically, any scenario with small or medium size datasets where you would use generative models. PPLs are just a way to encode generative models and get an inference engine compiled for you, instead of needing to write one. For example:

* Landing the Apollo on the Moon or tracking systems used by e.g. Sidewinder employ Kalman filters. See Example 24.4 (p. 510) [1].

* Predicting ride demand on heavy-tailed time series. Uber does this all the time [2].

* Estimating the effect of some policy on data with hierarchical structures (State > County > Individual observations) [3].

[1] http://web4.cs.ucl.ac.uk/staff/D.Barber/textbook/200620.pdf

[2] https://pyro.ai/examples/forecasting_i.html

[3] https://mc-stan.org/users/documentation/case-studies/radon_c...

ke88y · on July 28, 2023

I think largely for the same reason that numerical software took off where symbolic solvers didn't.

Much more user friendly, "good enough", and actually scales to problems of commercial interest.

esafak · on July 28, 2023

These things go in and out of fashion. Now it's LLMs' turn to have their fifteen minutes.

I think one reason why Bayesian models have not taken off is that representing prediction uncertainty comes at the expense of accuracy, for a given model size. People prefer to devote model capacity to reducing the bias rather than modeling uncertainty.

Bayesian models make more sense in the small-data regime, where uncertainty looms large.

mccoyb · on July 28, 2023

I don’t think the field has converged on “the right abstractions” yet.

It’s an active area of programming language research — it feels similar to where AD was at for awhile.

I work on this stuff for my research — so I do believe that there is a really good set of abstractions. my lab has had good success at solving problems with these abstractions (which you might not think are amenable or scale well with Bayesian techniques, like pose or trajectory estimation and SLAM, with renderers in a loop).

Other PPLs I’ve studied also have a mix of these abstractions, but make other key design distinctions in interface / type design that seem to cause issues when it comes to building modular inference layers (or exposing performance optimization, or extension).

I also often have the opinion that the design choices taken by other PPLs feel overspecialized (optimized too early, for specific inference patterns). I’m not blaming the creators! If you setup to design abstractions, you often start with existing problems.

On the other hand: if you’re just solving similar problem instances over and over again, in increasingly clever ways — what’s the point? Unless: (a) these problems are massive value drivers for some sector (b) your increasingly clever ways are driving down the cost, by reducing compute, or increasing speed.

I think PPLs which overspecialize to existing problems are useful, but have trouble inspiring new paradigms in AI (or e.g. new hardware accelerator design, etc).

Partially this is because there’s an upper bound on the inference complexity which you can express with these systems — so it is hard to reach cases where people can ask: what X application would this enable if we could run this inference approximation 1000x faster?

(Also note that inference approximations _can_ include neural networks)

krisoft · on July 28, 2023

> pose or trajectory estimation and SLAM, with renderers in a loop

That sounds very interesting! Is there something I could read more about it? Perhaps publications by you or anything like that you could recommend?

mccoyb · on July 28, 2023

I would start with this submission by my colleague Nishad Gothoskar to NeurIPS: https://proceedings.neurips.cc/paper/2021/hash/4fc66104f8ada... as an excellent starting point to what good inference abstraction can enable

rich_sasha · on July 28, 2023

I'm no authority on the subject, but FWIW I tried quite a bit to make various bayesian methods work for me. I never found them to outperform equivalent frequentist (point estimate) methods.

Modelling uncertainty sounds nice and sometimes is a goal in itself, but often at the end of the day you need a point estimate. And then IME all the priors, flexible models, parameter distributions, just don't add anything. You could imagine they do, with a more flexible model, but that is not my experience.

But then, PPL is just so much harder. The initial premise is nice - you write a program with some unknown parameters, you have some inputs and outputs, and get some probabilistic estimates out. But in practice it is way more complex. It can easily and silently diverge (i.e. converge to a totally wrong distribution), and even plain vanilla bayesian estimation is a dark art.

thumbuddy · on July 28, 2023

I've had them out perform frequentist methods, but there is a real cost too it and some of it is closer to witchcraft then science.

That said, I'll have to give this a spin sometime soon.

j7ake · on July 29, 2023

You need to be intimately aware of your data input, the models you’re proposing, and initialisations.

Practically, this means iteratively visualising your data and making informed judgements to even make your model run without humans thinking much about the structure of the data and model.

The ML promise is that there are robust models that you can feed nearly unlimited amounts of data to get better predictions.

Probabilistic modeling is better for people who have a fixed dataset they can visualise and fit an elegant model that incorporates lots of prior information about the problem of interest.

KRAKRISMOTT · on July 28, 2023

They are used heavily in ML, how do you think VAEs work?

The white elephants are mostly the DSLs/frameworks that would have better off been torch/tensorflow extensions.

palmy · on July 28, 2023

Though I see what you're saying, using a PPL for VAEs just seems like overkill given the simplistic nature of VAEs.

PPLs are useful when the data generation process is not easily represented by something like a simple multivariate Gaussian, etc. You find many good examples academic research, e.g. epidemiology.

KRAKRISMOTT · on July 28, 2023

Yes but mathematical integration (to solve Bayesian equations) is difficult, the higher the dimension, the more difficult it is. That's why differentiation is preferred. The concepts behind PPL are firmly entrenched in probabilistic ML, the ideas were never lost.

CyberDildonics · on July 28, 2023

Making a new language is not the way to do it. A new language means you wipe out all the tools you were using before, from syntax highlighting to libraries to optimizations. Even languages like java, go, julia, lua and D worked on their garbage collection for at least a decade.

Not only that, there is no reason why the math can't be done in a library and used in another language in the first place.

rustybolt · on July 28, 2023

> Does anyone who works in this area have a sense of why PPLs haven't "taken off" really?

Why should they take off? At least for me personally it's not clear what the use case is, and this website answer exactly none of my questions.

rogue7 · on July 29, 2023

In my line of work I've been using numpyro to model physical phenomena.

There are multiple unbalanced categorical variables, so partial pooling helps a lot to infer the target in regions where the data is sparse.

CreRecombinase · on July 29, 2023

I think what it comes down to is that it's very difficult to divorce modeling from inference.

jmugan · on July 28, 2023

I wish Pyro would do a better job of hiding the implementation details. I shouldn't need to understand variational inference and such just to get the probability of a god dang hot dog. I've tried to use Pyro a few times, but every time I spend more effort trying to understand poutines and such instead of modeling my problem.

jmugan · on July 28, 2023

And I wish they would merge it with the beautiful explanations at https://probmods.org/. We need a practical probabilistic programming language in Python. We have PyMC, but to use that you have to pull out your old notes on Theano.

jonathf · on July 29, 2023

That's not true anymore. Pymc4 switched backend last year. https://www.pymc.io/blog/v4_announcement.html#v4_announcemen...

adammarples · on July 29, 2023

To a fork of theano

nerdponx · on July 28, 2023

PyStan? Numpyro?

jmugan · on July 28, 2023

Those didn't pop up last time I searched in this area. I knew about Stan but not PyStan. I'll check those out.

theptip · on July 28, 2023

Interested in folks’ thoughts on how BeanMachine compares too.

nerdponx · on July 28, 2023

FWIW Stan can work like this at least in simpler models, especially if you use one of its R wrapper packages.

cube2222 · on July 28, 2023

Another cool probabilistic programming language is Turing[0] in Julia.

Had fun using it when working my way through the statistical rethinking series.

[0]: https://turing.ml/

randrus · on July 28, 2023

There’s a name collision with “Python Remote Objects”. Which I have to see as unfortunate, given my scars from that other pyro.

gjvc · on July 28, 2023

I flirted with that but never used it. What was it like?

randrus · on July 28, 2023

It's a remote object model, very similar in spirit to CORBA. This allows the object creator/user and the object itself to be in different fault domains - which makes it all too easy to lose track of objects and leak them, unless you've added significant management scaffolding.

gjvc · on July 29, 2023

I know what it is, but thank you for the leaking part.

dang · on July 28, 2023

Ask HN: What companies are using probabilistic programming? - https://news.ycombinator.com/item?id=17220861 - June 2018 (33 comments)

Uber Open Sources Pyro, a Deep Probabilistic Programming Language - https://news.ycombinator.com/item?id=15637329 - Nov 2017 (22 comments)

Pyro: PyTorch-Based Deep Universal Probabilistic Programming - https://news.ycombinator.com/item?id=15619634 - Nov 2017 (41 comments)

Uber AI Labs Open Sources Pyro, a Deep Probabilistic Programming Language - https://news.ycombinator.com/item?id=15619324 - Nov 2017 (1 comment)