P-Value
Categories: Metrics
A p-value is just a probability. Check the “p” in p-value. It literally stands for probability. But it’s a very specific probability. It’s the probability (or likelihood) that a result we got from a sample we took matches what the “actual” value is supposed to be.
Just to get a handle on this sample result versus actual result thing, let’s talk heights of American males. It’s widely believed (and backed up by a lot of data) that the average American male is 5 feet 8 inches tall. That’s our actual value. Let’s say we took a random sample of American men we saw on the street and got an average in that group of 5 feet 2 inches. The p-value for this sample versus actual scenario would represent the probability that we’d get such a small average height in our sample in a world where American men have a much larger actual average height. Honestly, it seems pretty unlikely that, if the average American is 5’ 8”, we’d get an average of 5’ 2” in our group. In other words, it’s quite unlikely to get that sample result...meaning our probability, or p-value, is a very small value (like, maybe less than a 5% chance of that happening).
This process, where we compare a sample result to an actual value, is called a hypothesis test, and we can run one to see if our sample average matches a stated average, or if a sample proportion or percentage is the same as a stated percentage, or in lots of other scenarios, like comparing sample standard deviations to actual ones. In all cases, we use our sample result and our actual value to come up with that p-value or probability. Typically, we calculate a p-value using a graphing calculator, spreadsheet, or website built for that purpose.
It’s a bit like the “beyond a reasonable doubt” phrase in court cases. If there’s enough evidence beyond a reasonable doubt, we support conviction, or the idea that the suspect is guilty of the charges. If there isn’t sufficient evidence or reasonable doubt, we support dismissing the charges. If our p-value is quite small or below some predetermined threshold, like 1% or 5% or 10%, which are commonly used cutoff values, we have evidence to support the idea that maybe the actual value isn’t correct after all. If the p-value is larger or above those same cutoff values, we don’t have any evidence to support the idea that the actual value isn’t correct.