Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
So You Think You Have a Power Law (2007) (bactra.org)
130 points by tosh on Feb 19, 2018 | hide | past | favorite | 35 comments


Rule 6 of Akin's Laws of Spacecraft Design:

6. Everything is linear if plotted log-log with a fat magic marker.

http://spacecraft.ssl.umd.edu/akins_laws.html


I was surprised to not see my favorite NASA quote... "no situation can be so bad that it can't be made worse"


Oh wow

35. (de Saint-Exupery's Law of Design) A designer knows that he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away.

Civilization IV had that as a particular technology quote and I can't remember which one

Edit: Oh, of course. "Engineering"!


Back in my academia years, I was really frustrated with that group of researchers, "cult of power law" I called them - mainly centered around Santa Fe Institute.

In all their talks, the pattern was always the same: 1) take a dataset from some random system, and strip it from all its domain context ("interdisciplinary" research) 2) brag about being a "physicist" thus applying a "physicist's" approach to new areas of research 3) plot data on log-log scale - kind of looks like a power law 4) make a toy model and use it to "simulate the system" 5) plot simulation vs real data on log-log scale -- kinda look the same 6) promise that your trivial little model will reveal whole new horizons for that field you know nothing about - because others are stupid and you're a "physicist" 7) write a grant proposal


Power laws and scale-free networks are discussed in https://arxiv.org/abs/1801.03400 with HN comments at https://news.ycombinator.com/item?id=16144867. Both this and the cited paper are worthwhile reads for power law users.


You can't use a goodness of fit test to claim that your data follows a power law (or any distribution). You can only use a GoF test (such as Kolmogorov-Smirnov) to collect evidence that your data don't follow some hypothesized distribution. And if you collect enough data, your GoF test will reject every hypothesized distribution.


That's where #6 comes in:

>Use Vuong's test to check alternatives, and be prepared for disappointment. Even if you've estimated the parameters of your parameters properly, and the fit is decent, you're not done yet. You also need to see whether other, non-power-law distributions could have produced the data. This is a model selection problem, with the complication that possibly neither the power law nor the alternative you're looking at is exactly right; in that case you'd at least like to know which one is closer to the truth.


Note that paper appeared in SIAM, not "Reviews of Modern Physics" or some place where physicists might read it.

(Speaking as someone who wrote a probably invalid paper about power laws and who had Mark Newman working just across the hall in the 1990s)


Can anybody say which kind of function on these graphics?

Distribution of atoms in Solar systems looks like cos(m): https://upload.wikimedia.org/wikipedia/commons/e/e6/SolarSys...

Star clusters, looks like sin(s): http://cdn.iopscience.com/images/0004-637X/725/2/1717/Full/a...

Exoplanets, looks like sin(m): http://exoplanets.co/img/exoplanets-mass-distribution.jpg (more at http://exoplanetsdigest.com/author/yaqoob/ )


Star clusters look like normal. Exoplanets looks like 2 normals added up (probably because of 2 different detection procedures).

Distribution of atoms in the Solar System is the weird one. No idea what it is.


Yeah, the distribution of elements is weird and wonky because it depends on a lot of details about nuclear physics. First off, nuclei with odd number of protons or neutrons are less stable than nuclei with even numbers. That is what causes the staggered bumps every-other element.

The low-mass region is largely determined by stellar nucleosynthesis. After helium, largely works by shoving more helium atoms on. So, to you get lots of Carbon-12 and Oxygen-16, which are kind of like 3 or 4 helium nuclei stuck together.

The heavier regions get really weird, because those depend on the r-process. Some astrophysical events spit out heaps and loads of neutrons, which stick to nuclei. You go way the heck away from stability, then beta-decay back once things calm down. What products you get depend on the reaction rates of hundreds of possible reactions, few of which can be experimentally measured in a lab, because there is no way to make that strong of a neutron source.

Explaining that distribution is a topic of many, many dissertations, and does not in any way reduce down to a simple law.

Source: am nuclear physicist


Any similar articles or advice about power law relationships in general (not distributions)? I've fit a lot of data to power law relationships in the past but don't know if there are any non-obvious pitfalls. I can recognize when a power law obviously won't work, but as has been said, a lot of data can look like a power law relationship. So for each obvious failure of a power law relationship, there are a certain number of false positives.


Why do you guess that their paper from June 2007 mentioned in the blog was only released February 2009? Seems like a long revision period.

Nothing important, just threw me a bit off guard seeing the date of the blog post and the authors own prediction "forthcoming (2009)". But maybe he edited the blog once he knew when the paper finally came out.


I know for a fact that Clauset, Shalizi, and Newman first submitted their article to Reviews of Modern Physics. It was rejected there and then submitted to SIAM Review, which published it at one round of revision. SIAM Review back then was not known for being a "speedy" journal.


Title should have: (2007).


Added. Thanks!


IMHO this feels like quibbling as the author doesn't make clear when this would actually make a difference.


> the author doesn't make clear

he makes the point across several points (4, 6, and 7 and maybe 1) which is that using a power law gets tail estimates very badly wrong (or they fit the tail and then get the distribution of the rest of the domain wrong).


This a fascinating article that provides lots of worthwhile information for anyone planning in fitting a power law to their data.

I will probably never read another of the author's writings due to the pervasive negativity.


I don't see any negativity in this article, outside of maybe the headline and the first paragraph. It consists almost entirely of constructive suggestions for how to do data analysis.


Why is "pervasive negativity" a reason to ignore someone insightful? Do what you will, but you should at least check out the author's meticulous beat-down of Stephen Wolfram for peddling nonsense: http://bactra.org/reviews/wolfram/


Having dealt with the scale free network nonsense for a couple of years, I think he's being remarkably positive about a completely dysfunctional area of research.


I used a power series to approximate the radius of a planet to its mass for a webgl space game I'm making [0] where solar systems are randomly generated. I needed something to roughly approximate the size of a planet or star based solely on mass. Using the 8 planets + (sedna & pluto) and our star to generate the curve function I got a r^2 value of 0.989. [1]

I'm not about to make some scientific claim based on it but for my purpose (a neat game) using a power series to approximate mass was an extremely efficient and simple solution to my problem.

[0] http://thedagda.co:9000/

edit: if you have an xbox controller you can use this: http://thedagda.co:9000/?gamepad=true

[1] https://docs.google.com/spreadsheets/d/1GKPNNMJrZMaf8aQqgGD-...


I'm not seeing the relevance of this to the article? Perhaps you have conflated "power-series" with "power-law"?

I'm sure there are merits to your comment, but I don't think it should be the top-voted comment on a subject that is completely different from what you're commenting about.


It's at least tangentially relevant. I naively took some data and threw some curve fits at it until I had a function that reasonably matched reality for the observed data.

When I set the axi to log/log I get a straight line, which is his first point.


The article is talking about power law probability distributions [0].

What you seem to be doing is fitting a curve with a term that has an exponent, which is a power-law relationship.

That is not what the article is talking about, unless you're claiming the residuals of your fit follow a power law distribution.

I'm not blaming you for latching on the the term "power-law" -- it's a simple enough mistake -- but HN upvoters should recognize that this is not what the article is talking about at all.

[0] https://en.wikipedia.org/wiki/Power_law#Power-law_probabilit...


If you are thinking of using a log-log plot to demonstrate something, please don't. Log-log plots are extremely deceptive.

All polynomials (even x^99) are straight lines on a log-log plot.

http://www.cs.utsa.edu/~cs1173/experiments/Experiment2Logari...


It's the r^2 that matters right? I could give a crap about the picture.


Power series and power law distributions are completely different things


Yeah that's a power law. It would have been pretty easy to see had you used a log-log plot.


Mar's Law: "Everything is linear [power law] if plotted log-log with a fat magic marker".


This is literally what the article is arguing against doing.


They're cautioning against drawing statistical conclusions from it. That doesn't mean the log-log plot isn't more useful than the linear plot.

Also they seem to be talking about a power law distribution, whereas here we're talking about the dependence of two variables. As a first step plotting them on a log-log plot and saying 'yeah, looks straight' is fine. Among other things this tells us the dependence is suspiciously close to a cubic root law, which could possibly be justified using dimensional analysis.


Well you're right, but that doesn't change the r^2 value or the curve function. Not trying to do science here.


Thanks for the tip! That looks much better.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: