What should a policy analyst know about the normal distribution?

Last year, Scioto Analysis conducted a policy analysis to evaluate alternatives to reduce carbon emissions in the state of Ohio. In order to test our models, we conducted a Monte Carlo simulation, the “gold standard” of sensitivity analysis in cost-benefit analysis and a tool we often employ to see what range of possible outcomes.

Below are the Monte Carlo simulation results for a strong cap-and-trade program. For this Monte Carlo simulation, we ran hundreds of thousands of simulations of alternate policy scenarios, randomly generating different social cost of carbon estimates, discount rates, and price elasticities of demand for electricity. You can see the results of the Monte Carlo simulation below.

If you’ve taken a statistics class, you’re probably familiar with the shape of this distribution. It is one of the most important shapes in statistical analysis and one we end up using a lot when we’re modeling policy outcomes: the normal distribution.

The normal distribution, often called a bell curve because of its shape, is one of the most universally recognized statistical concepts. It is intuitive, broadly applicable, and useful in simplifying complex concepts. Here we will briefly discuss a few important characteristics of the normal distribution and why they are important.

Parameterization

The parameters of a statistical distribution are the things you need to know to fully understand it. For example, if we consider some binary outcome like flipping a coin (called a Bernoulli distribution), the only parameter we need to know to fully understand the range of outcomes is the probability of an event happening.

The normal distribution is special because it has two parameters, its mean and its variance. We can always calculate the mean and variance of observed data, but the fact that these tell us everything we need to know about the distribution of unobserved data from the same distribution is extremely powerful.

Symmetry and Outliers

Two other properties we will discuss together are the facts that the normal distribution is symmetric. This means that we should expect to observe values above and below the mean with the same likelihood, and that big outliers are extremely uncommon such that the we should only observe a value only four standard deviations above or below the mean is about 0.001% of the time. These two characteristics often shape how we think about applying the normal distribution to real data.

Consider the distribution of incomes in the US. We know that a few small outliers skew the distribution heavily to the right which makes fitting these data to the normal distribution difficult. If we just calculate the observed mean and variance of all individuals in the US, we would expect there to be extreme outliers in the negative direction as well.

The Central Limit Theorem

Arguably the most important concept in statistics, the central limit theorem is certainly the most useful application of the normal distribution. There is a lot of rigorous math that we will skip over here, but in short the central limit theorem tells us that if we repeatedly take random samples from a population, those sample means will be approximately normal.

Going back to the income example, if instead of measuring using the mean and variance of all incomes to approximate a normal distribution we took 500 random samples (with replacement) and calculated the means of all of those, we would find that those sample means did in fact follow a normal distribution quite well.

The normal distribution gives us a way to mathematically describe what we expect to happen with lots of unobserved data quite well. Understanding it at a surface level is valuable for policy analysts and policy makers since it so often works its way into our assumptions, whether we realize it or not.