Statistics Explained: Sensitivity vs. Specificity (with Positive Predictive Power thrown in)
It may not seem all that interesting or important in your everyday life, but the concepts of sensitivity and specificity are actually huge. For example, if you want to know if you are pregnant, or if you have coronavirus, or if that lump is cancerous, you’d better hope the test developers have thought about sensitivity and specificity. And if you are an attorney, you should make sure your expert has considered these issues related to the tests they use on which to base their opinions.
The concepts are simple and they are related, but they are also a bit confusing, so here we go:
Sensitivity is the extent to which a medical or psychological test has the ability to indicate an individual has the condition the test is measuring. For example, if a test is measuring whether or not you are pregnant, a highly sensitive test would catch most people who have the ‘condition’ of pregnancy. It is measured like a correlation, so the sensitivity range of a test can go from -1 to 1. The closer to 1 a test is, the higher the sensitivity. At 1, the test is perfect—it catches every single pregnant person with every test administration.
But, high sensitivity on a test is not always great, especially when it comes at the expense of specificity.
Specificity is the extent to which a medical or psychological test can rule out people who do not have the condition. It is measured on the same -1 to 1 scale as sensitivity, and one would hope that a pregnancy test would have a specificity of exactly 1 for every man who takes the test—you don’t want a pregnancy test to ever tell a man he is pregnant.
Tests that are highly sensitive (i.e., they catch many or all true positives) sometimes have to sacrifice specificity in order to do so, creating what are called ‘false positives.’
Imagine an extreme example: A new pregnancy test for people is developed. It consists of one question: Are you human? If you answer ‘yes’ to that question, the test says you are pregnant. If you answer ‘no,’ the test says you are not pregnant.
This new one-question pregnancy test would have a sensitivity score of 1. It would catch literally every person who is pregnant, because, by necessity, every person who is pregnant is human. But, it would have extremely low specificity because it would create millions of false positives. For example, it would assume every man who took the test was pregnant, along with many non-pregnant women.
So obviously, it is important to take into account both sensitivity and specificity when evaluating whether or not the results of any particular test are reliable. And one way to boost a test’s reliability is to also understand the role of another statistical concept: Positive Predictive Power (PPP).
PPP, sometimes known as Positive Predictive Value, is the probability that someone with a positive score on a test actually meets that condition. For a pregnancy test, that would mean that a positive test result means the person actually is pregnant. In my example of the one-question pregnancy test with perfect sensitivity, the PPP score would be terrible. Considering roughly 50% of the world’s population is male and at any given moment in time most women are not pregnant, the PPP would never be able to get above 10-15%. It would essentially be useless. You might as well flip a coin.
In summary, the concepts of sensitivity, specificity, and positive predictive power are incredibly important when weighing the reliability of any type of test on which an expert relies to form an opinion. Questioning experts on this aspect of a test’s reliability can help bolster their overall credibility (if they relied on good tests), or it can significantly lower their credibility (if the tests they used are bad).