As a statistician, I tend to be inherently skeptical anytime I see a single number reported as a prediction for the outcome of a policy proposal. Point estimates are certainly useful pieces of information, but I want to know how likely these predictions are to come true. What is the range of possible outcomes?
In policy analysis, it sometimes seems impossible to certainly tell what the range of possible outcomes might be. What if one input in our predictive model is higher than we think, but another is lower than we think? What if there are so many inputs that it would be practically impossible to test for all of the different possible outcomes?
It is times like these when we can turn to Monte Carlo simulations to estimate our variance. From Scioto’s Ohio Handbook for Cost Benefit Analysis: “The essence of Monte Carlo simulation is to generate a large number of possible outcomes by varying all the assumptions in the analysis.” By changing all of the inputs at once over thousands of trials, we can more accurately measure the uncertainty in our predictions.
At their core, all statistical models are essentially just mathematical equations. Imagine we are considering building a new public swimming pool and want to conduct a cost-benefit analysis. In one model, the costs would be the construction and annual maintenance costs, and the benefits would be the average benefit per person multiplied by the number of people that we expect to use the pool.
Benefit per Person x Expected Visitors - Construction Cost - Annual Maintenance = Net Benefits
The four inputs to this model are not fixed values, but instead are random variables. We can use observed data to create sensible estimates for these random variables, but at their core they are not deterministic.
We can define a random variable by its probability distribution. A probability distribution conveys two critical pieces of information: what all possible outcomes are and how likely those outcomes are. Once we have real data, we can say that we have an observation or a realization of a random variable. Observations and realizations are associated with a random variable and a probability distribution, but they are not random themselves.
Let’s apply this to our swimming pool example. To estimate the number of visitors our pool will have, we can collect data about the number of visitors other public pools have. To create a point estimate, we might take the average value of our observations and plug it into our formula. Repeat that process with the other three inputs and you have a basic cost benefit analysis.
However, if we assume that the number of visitors to public pools are all observations of the same random variable, then we can make some claim about the distribution of that random variable. An in-depth knowledge of statistics is needed to make and verify these distributional assumptions, but the point is that we are defining all the possible outcomes and how likely those outcomes are.
With the probability distribution defined, we can use statistical software to generate thousands of observations that all follow the same distribution. This gives us thousands of inputs into our equation and thousands of different results, meaning we can now analyze the range of possible outcomes.
Monte Carlo simulation really begins to shine once we start defining the probability distributions for all the random variables in our equation. It helps at this step to think of the Monte Carlo simulation as happening in rounds.
In a single round of simulation, we generate an observation for each of our four random variables from their respective probability distributions. One round might have above-average values for the number of visitors and the benefit per visitor but below-average values for the cost variables, leading to well-above-average net benefits. Repeating this for a few thousand rounds will allow us to accurately see the range of possible outcomes and more importantly the likelihood of each potential outcome.
These simulations often involve a lot of assumptions about the distributions of random variables. Understanding and checking these assumptions is required in order to generate meaningful results.
Monte Carlo simulation is a powerful tool for analysts to use. It goes beyond just offering a single estimate for a prediction, and provides deeper insight into the likely range of outcomes. If you have the time and the statistical background to perform a Monte Carlo simulation, doing so can dramatically improve the quality of your estimates.