To this end, I'm putting the discussion of the rest of Steve Frank's paper temporarily on hold.
Today's post is a start of a discussion of the paper "Power laws, Pareto distributions, and Zipf's law", by Mark Newman (Contemporary Physics, 46, 323-351 (2005)). I'm going to separate this into a few different posts, so I have some hope of actually posting something.
Mark's paper is a great, extremely clear review of some ideas related to power laws. Mark is a terrific writer. He's also Professor of Physics and Complex Systems at the University of Michigan, and External Professor at the Santa Fe Institute. Here's his picture:
Mark is also the author of this recent textbook on networks (Oxford University Press, 2010):
An interesting (unrelated) fact about Mark, given that this year is the 100th anniversary of Alan Turing's birth: Mark's grandfather, Max Newman, while a mathematics lecturer at Cambridge, introduced Turing to the "Entscheidungsproblem" (or "decidability problem"), which inspired the invention of the "Turing machine", which arguably gave rise to the invention of programmable computers.
Anyway, back to the paper at hand.
Anyway, back to the paper at hand.
Power Laws Versus Normal Distributions
Just for reference, here is are pictures of a Gaussian or "Normal" distribution
(left /top) and a power-law distribution (right /bottom):
(left /top) and a power-law distribution (right /bottom):
Recall that a "distribution" plots some quantity (e.g., SAT scores) on the x-axis versus the observed frequency or probability of those scores on the y-axis. Sometimes probabilities (rather than raw frequencies) are plotted on the y-axis, so the sum of all values is equal to 1.
There are some interesting differences to note between the Gaussian and the power law distributions. First, the Gaussian is symmetrically peaked around a small range of "typical" values, the middle of which happens to be the mean of the distribution. Also, the distribution falls off to (very close to) zero on either size. The range of values on the x-axis for which the distribution is non-zero is called the "scale". The power-law distribution is peaked at the lowest value on the x-axis, and decreases for higher values. It falls off more slowly than the Gaussian distribution, resulting in a so-called "long tail" or "heavy tail". It doesn't have an obvious small range of "typical" value in the way that the Gaussian does.
In terms of probabilities, it's clear that for the Gaussian, "extreme events" (e.g. very low or very high SAT scores) are quite low in probability compared to the average of the distribution. But in the the power law distribution, such extreme events are more probable than in the Gaussian distribution, due to the long tail. This is one of the more important implications of power-law distributions in the real
world. As McKelvey and Andriani point out: "The lesson we can draw...is that extreme events, which in a Gaussian world could be safely ignored, are not only more common than expected but also of vastly larger magnitude and far more consequential." [1]
Examples of Power Laws
In his paper, Mark Newman gives a long list of examples of (purported) power law distributions in natural and technological systems, including:
- Word frequency in natural language (the most frequent words are vastly more frequent than the least frequent words)
- Citations of scientific papers (There are a small number of papers with a huge number of citations and a very large number of papers with no (or very few) citations)
- Magnitudes of earthquakes (Very small earthquakes are common; very large earthquakes are rare)
- Intensities of wars
- Wealth of richest people
- Populations of cities
unreasonably, claim that power-law distributions have been observed in language, demography, commerce, information and computer sciences, geology, physics and astronomy, and this on its own is an extraordinary statement."
Mathematics of Power Laws
Here is the mathematical form of a power law:
That is, the probability that some quantity (e.g., earthquake size) has value x is equal to a constant (C) times x raised to the power -alpha. The constant C normalizes the distribution -- i.e., makes all the probabilities sum to 1. The inequality on the right says that the power-law relationship holds only for x greater than some minimum value x_0.
Suppose, for example, alpha = 2. then we would have
Suppose x_0 = 1. Then P(x) would be maximum when x = 1, would be 1/4 that value when x=2, 1/9 that value when x=3, etc. Imagine x represents earthquake size on the Richter scale. As we would expect, small earthquakes would have the bulk of the probability, whereas any particular large earthquake (e.g., Richter scale 8) would be very unlikely. The scary thing is that every large earthquake size has some, albeit low, probability, so the total probability that a "big one" will happen is non-negligable. That is, the power-law distribution makes it inevitable that a big one *will* almost certainly happen at some point. If earthquake sizes were Normally distributed, it would be much less likely that a big one would take place.
Replacing "earthquake size" with "extreme financial crises" in the above, we get the example of the great recession of 2008, which, if such things are power-law distributed, was bound to happen. Evidently a lot of economists thought that such things were Normally distribution. It seems that they were probably wrong.
Many times people show pictures of power-law graphs on "double logarithmic" or "log-log" plots -- that is, the the x and y axes are on a logarithmic scale rather than on an absolute scale. (E.g., Richter scale readings of 1, 2, 3, ... actually represent 1, 10, 100, etc. times the strength of earthquakes, so are on a base-10 logarithmic scale.) Let's do a bit of simple algebra:
Assuming that the vertical axis plots log P(x) and the horizontal axis plots log x (i.e., a "log-log" plot), the right hand side of the above equation gives the expression for a straight line with slope -alpha and intercept log C. Thus, if you plot a power law on a log-log plot, you will see a straight line.
The quantify that matters most to understand a power-law distribution is the exponent alpha, which tells something about the underlying process creating the power law.
I should mention that power laws don't only describe distributions such as probabilities of earthquake sizes -- they can describe scaling laws as well -- e.g., metabolic rate of an organism scales as mass raised to the 3/4 power (Kleiber's law) -- more on that in future posts.
Why Care?
Why should we care whether something is a power law (versus some other distribution)? The form of the distribution can say a lot about the underlying process, which is usually what science is trying to get at. One problem though--many different underlying processes produce power laws. Mark Newman's paper lists several different possible mechanisms, some of which will be discussed in my next post.
Upcoming:
Statistical properties of power-law distributions
What it means for a distribution to be "scale-free"
What do the exponents mean?
What are Rank-Frequency plots, such as Zipf's law or the Pareto distribution?
What are the mechanisms that might give rise to power-law or other heavy-tail distributions?
Are we really seeing power laws, or just approximations to power laws, or (in Cosma Shalizi's words) hallucinations of power laws? Does it matter?
Will I ever finish this post?
Stay tuned!
References
[1] B. McKelvey and P. Andriani, Why Gaussian statistics are mostly
wrong for strategic organization. Strategic Organization, 3(2): 219-228, 2005.
[2] Willinger, Walter and Alderson, David and Doyle, John C. and Li, Lun (2004) More "normal" than normal: scaling distributions and complex systems. In: Proceedings of the 2004 Winter Simulation Conference. IEEE Press , Piscataway, NJ, pp. 130-141. ISBN 0-7803-8786-4
Thanks for this very clearly written post. One minor nitpick: "The right hand side is the equation of a straight line" is only true within the log plot. On the linear scale it would still be a log function.
ReplyDeleteThanks, CK. I have clarified this in the text.
ReplyDelete