Why don’t we know how many people have been infected with Covid-19 in South Africa?

Universal masking is one strategy in particular that is very low cost and is all the more critical if indeed the asymptomatic infections are much higher than we thought, as suggested by the international studies, says the writer. (Illustrative image | source:
By Max Price
05 May 2020 0

Of the many gaps in our knowledge about Covid-19, one of the most significant is the actual rate of infection, ie how many people are being infected. This article has two key messages – why knowledge of the infection rate is so critical to our decisions about how to manage the epidemic; and why it is so difficult at present to know what the infection rate is. Hence putting these messages together, the article is also about why we are still fumbling in the dark to determine the best strategy.

As of 29 April 2020, the current number of infections in South Africa was about 5,000. This comprises mostly patients who developed symptoms and came to a health facility of their own accord. The balance are individuals whose symptoms were detected through community screening and who were referred for a test and found to have the virus in them. But what if there are many patients that never develop symptoms, and so don’t come to health facilities and don’t get detected by the community-based screening teams?

From the early China experience, we have known that there are individuals who acquire the infection and themselves become infectious to others, but do not develop symptoms. Those studies suggested that between 20% and 50% of those infected develop symptoms. The corollary of this is that for every patient we know of with symptoms, there are likely to be another two to four who are infected, but do not have symptoms.

But over the last two weeks, surveys done in several countries of the general population (not those who were ill), suggested that the actual number who had been infected might be five to 50 times the number who had become ill. For example a study in Santa Clara, California, concluded the actual number of infections was 50 times higher than had manifested with symptoms and been confirmed. 

A random survey of customers in New York found that 14% of the population had already been infected and recovered, which would mean that there were 2.7 million people in the state who had at some point been infected with Covid-19. But the state had only recorded one-quarter of a million confirmed cases at the time of the survey. This means that 90% of cases had been missed – presumably because most had had very mild or no symptoms. 

There have been similar findings from surveys in the Netherlands where the study suggested that half a million people had been infected although there were only 28,000 confirmed cases i.e. 20 times more. These studies have evoked much debate, criticism and caution – more on this later.

If, and it is a big IF, these findings are correct and transferable to South Africa, then the implications in South Africa are that as compared to the 5,000 cases we know about, 25,000 to 250,000 people might already have been infected.

The implications for managing the epidemic are dramatic.

First, it would mean that the severity of the disease was actually quite low. (To simplify the explanations, all figures have been rounded and I will use a single number rather than a range.) Let’s say the real number of infections in SA is 100,000. Instead of 100 deaths out of 5,000 cases (2%), the true death rate would be 100 deaths out of 100,000, or 0.1% – similar to annual flu. 

And if, by the end of the epidemic, 40 million people became infected, that would lead in total to 40,000 deaths. That compares with 78,000 deaths from TB in 2018, and about 500,000 deaths in total in 2018. The severity of our response and the economic costs we are willing to bear would be quite different compared to the currently anticipated total of 800,000 deaths. (I say currently anticipated since, if there are truly only 5,000 cases and 100 deaths, i.e. a Case Fatality Ratio of 2%, 40 million infections will result in 800,000 deaths.)

The second fundamental implication of having 100,000 rather than 5,000 cases, is to question the value of continuing the lockdown strategy. From the reduction in the rate of new cases that occurred from 27 March 2020, it is very likely that the restrictions imposed under the State of National Disaster slowed transmission and bought the country time to prepare for the surge. However, if in spite of the lockdown we still have 100,000 infections, then the control is less effective than we thought. It is, of course, possible that, absent the restrictions, we would have had a million infections and 1,000 deaths. We may be ready to handle that now.

The third implication is for other strategies aimed at interrupting transmission. For example, the huge effort going into contact tracing, quarantining and isolation is justified if it is reaching most of those who get infected, to stop them infecting others. But if it is only reaching 5% of those infected, and the other 95 out of every 100 infected individuals are asymptomatic and out in the community infecting others, then the quarantining strategy of those not yet infected is going to have very little impact and is probably not worth pursuing.

On the other hand, universal masking of everyone, symptomatic or not, combined with personal hygiene, will be much more effective as it is reducing transmission by all those asymptomatic infectious people in the community.

The fourth implication is on the achievement of threshold herd immunity. This is the proportion of the population that needs to become immune before the epidemic slows down and eventually fizzles out. In South Africa, with modest physical distancing and no lockdown, it is assumed that this will be about 60%, or 36 million. In the absence of a vaccine, this occurs only through infection. But if 100,000 people have already been infected, then we will approximate herd immunity level far earlier than if only 5,000 have been infected.

Clearly, knowing what the real infection rate is in South Africa is critical to determining our strategy. So why is it so difficult to establish how many have been infected?

To explain this we need first to understand the two main categories of tests that are done.

The first, currently the mainstay of diagnosis everywhere in the world, is the RT-PCR (Reverse Transcription Polymerase Chain Reaction), which detects fragments of the virus genetic material present on the swab specimen from the back of the throat. This proves that the virus is present. But after recovery, the virus is no longer present, so the test will turn negative. This test thus cannot be used in a survey of healthy people to find out how many were previously infected and have recovered. And since there may be so many people with asymptomatic infections who never come for RT-PCR testing, we cannot rely on that figure to know how many infections there have been.

The second group of tests looks for antibodies in a blood sample. The antibodies have been formed by the body in reaction to the infection and last a long time after the infection has gone. This ongoing presence of antibodies usually provides us with immunity should the virus attack us again, though it does not do so for all diseases, and we don’t know for sure regarding Covid-19. But we do know that the antibodies to C0vid-19 are present in the blood for months post-infection. It is these antibody tests that are used for assessing the level of past infection in the community.

There are now many antibody tests that have been developed around the world with varying reliability. So what prevents us from doing a random blood survey of a representative sample of the population to find out how many have been infected? The most important reason has to do with the problem of false-positive results in populations where the infection rate is very low. This is also the crux of the criticisms of the other studies mentioned above.

A good quality test would aim to have a specificity of about 99%. This means that if you test 100 people who have not had the Covid-19 infection, and therefore do not have antibodies, the test will give the correct result 99 times, but may incorrectly classify one of the 100 as positive. There may be many reasons for this. To give just one example, the test may weakly cross-react with another coronavirus that the population has been exposed to in the past and may show a positive result in someone who has high antibody levels to that other virus, but has not had the Covid-19 virus.

In the South African population at present, we know of 5,000 infections, and we also think there may be at least an equal number that had mild symptoms and did not come for testing. So let’s assume the “real” number of accumulated infections is 10,000. As a proportion of the population of 60 million, that is 0.017% or 1 in 6,000. 

If we conduct a survey of 6,000 people randomly selected, and assuming the test is good at detecting real cases, it will pick up the one antibody-positive person. But with a false positive rate of 1%, it will also report that another 60 people are positive. Since we cannot know that those are false positives, we will think there are 61 cases in a sample of 6,000.

In other words, we will overestimate the number of people who have been infected 60-fold, inferring that there were 600,000 cases already in South Africa.  

But clearly that would be wrong. Even if we had a brilliant test with a 99.9% specificity, we would still find six false positives in our sample of 6,000, and we would incorrectly conclude that instead of 10,000 cases, there were actually 60,000. There are currently no tests with better than 99% specificity, and most are lower.

Note that the error rate is much less significant as the epidemic takes off. When the actual prevalence of Covid-19 antibodies gets to 10% of the population, then in a sample of 6,000, where 600 are confirmed true positives, there would still be an additional 54 false positives (1% of 5,400) – but this gives a total positive rate that was only 9% higher than the true value (654 compared with 600 out of 6,000). [See footnote.]

What do I take away from all this uncertainty?

First, that the studies around the world showing that the number of people infected are 10 to 50 times higher than the number confirmed must be treated with caution because a significant portion of that “excess” is likely a result of the false-positive rates of the tests. We have seen that such a study, if conducted in South Africa now, might well suggest that there are 60 times more cases than there are. Nevertheless, there is good reason to think the rates are significantly higher than those we have confirmed with RT-PCR – probably at least four times higher, i.e. about 20,000 infections, but quite possibly many more.

Second, with the low levels of infection in South Africa at present, well under 1%, any testing will throw up false positives that are a multiple of the true number. With the current battery of tests, we cannot get useful results until the true rate is above 1%. We have to keep doing surveys to estimate that prevalence. We just have to discard the results until they exceed the false positive rate.

Third, it ought to be possible to use combinations of tests simultaneously which can achieve higher specificity together, but these need to be validated in local populations. This should be done as soon as possible.

Fourth, that the strategies for interrupting transmission should be different if we discover that the proportion of asymptomatic are significantly higher than the symptomatic infections. I know I will be asked – “what is ‘significant’?” I don’t know, but would speculate that it would need to be at least 10 times higher. 

Given the economic and social costs of these strategies, we must adapt them as soon as better information on infection rates becomes available and in some cases we should make calculated guesses on the impact of these interventions given uncertainty about true infection rates.

In the meantime universal masking is one strategy in particular that is very low cost and is all the more critical if indeed the asymptomatic infections are much higher than we thought, as suggested by the international studies. DM

FOOTNOTE: For purposes of explanation, I have treated the false positive rate as an exact number. In reality, the specificity of a test in a given setting will always be a range, or a confidence interval, as a result of random sampling errors and variability in the test and the population.

So although the average specificity of the test might have a range of 97% to 100%, and in the example above, the false positives could have been anywhere from 0 to 180 out of 6,000, with only one true positive. This would imply a population prevalence of past infection of between zero and 1.8 million, as compared with the 5,000 known cases. One way of dealing with false positives is to subtract the anticipated number of false positives that would be expected based on the known specificity of the test, which should leave the true positives. In this example, one would subtract 600 from 601 to get the one positive in the sample of 6,000.

However, because the false positive is a range, not a single number, you would have to subtract the estimate of the largest possible number of false positives, in this case, 180. If the test only showed up 61 positives, then that would lead to a negative number – clearly nonsense.

However, if the prevalence had been 10%, with 654 positives (600 true positives and 54 false positives) then one could subtract the high estimate of false positives assuming the test had only been 97% specific, i.e. 180, leaving a result of 474 positives out of 6,000, i.e. 7.9%. That is a conservative estimate of the truth (10%), but not by orders of magnitude, and still very useful. Indeed statisticians can do a number of other manipulations to achieve a more accurate result.

In general, though, a test like this becomes useful for this purpose of establishing population prevalence when the true prevalence is greater than the highest estimate of the test’s false positive rate. In this example, if the highest estimate of the false positive rate is about 3% then the test becomes useful when the true population prevalence exceeds about 4%, but will still report a very wide confidence interval from about 0.5 to 4.5%. But at least we would know that it was above 0.5% – we may have no other way of finding that out.

Dr Max Price is a Non-Resident Fellow of the Centre for Global Development, and the former Vice-Chancellor of the University of Cape Town. He is a medically qualified public health expert and was formerly dean of the Faculty of Health Sciences at the University of the Witwatersrand.


"Information pertaining to Covid-19, vaccines, how to control the spread of the virus and potential treatments is ever-changing. Under the South African Disaster Management Act Regulation 11(5)(c) it is prohibited to publish information through any medium with the intention to deceive people on government measures to address COVID-19. We are therefore disabling the comment section on this article in order to protect both the commenting member and ourselves from potential liability. Should you have additional information that you think we should know, please email [email protected]"

Please peer review 3 community comments before your comment can be posted