The World Health Organization, Samaratins Purse, Doctors Without Borders, and other international medical emergency relief programs are desperately calling for additional resources in the international fight against Ebola that has already killed thousands and is likely to kills thousands more even if a full arsenal of aid was made available.
The World Health Organization released a statement on August 28th that the epidemic could afflict more than 20,000 people before it could be brought under control. This however, assumes full international backing for an intervention to control the deadly outbreak. Failure to fully support the WHO's plan presumably would cause the disease to continue to spread in a similar manner as it already has.
At first a figure as high as 20,000 seems exaggerated especially when looking just at the number of 3,000 cases reported the same day as the announcement. However, I believe that this estimate is vastly too small and is entirely based on an effective and well funded international relief mission. Using a projection from all of the WHO reports to date I calculate that if the disease continues to spread at the rate it currently is then we will have more than 20,000 cases by October 24.
The report also states that it will likely take six to nine months in order to stop the epidemic. However if nothing changes and the epidemic continues to rage as it currently does then my projections estimate that as many as 4.7 million people will have been infected and 1.2 million will have already died.
These are extremely dire predictions and I hope that they are purely extrapolations based on what will later be seen as the scariest month of this epidemic. However, the exponential growth model fits the data very well and at least in the short term should be expected to be fairly accurate.
All of this analysis is done in R and the code can be found on Github.
From 41 CDC Ebola reports I have assembled a small database of cases by country listing the number of 'Suspected or Confirmed Cases', the number of 'Deaths' suspected to be associated with Ebola, and the number of 'Laboratory Confirmed' cases of Ebola. You can find and add to this database as a google spreadsheet here. If running the code for yourself it will import the spreadsheet data directly.
Mapping this data by country and fitting a local polynomial regression to give a fitted line for each country gives us some signs of a very disturbing trend. The country in which the current outbreak originated is Guinea and though the disease continues to claim new victims it is much less worrisome compared with Sierra Leone and Liberia where rates of suspected cases and numbers of deaths are exponentially growing.
By exponential growth, we mean that whatever the current number of infected people are, we can expect them to infect some additional number of people proportion to the transmission rate. The problem with exponential growth is that while the inclusion of new victims can initially start out small the more victims there are the more are likely to be added to the victim pool each day.
When we look at the total numbers of each case summed across country we arrive at the above graph.
From this graph it is clear that a direct linear model cannot fit well at all. Suspecting that the change over time might fit an exponential growth function, I take the natural log of the values mapped above.
This new transformed graph demonstrates an extremely distributing confirmation that using an exponential growth model would be an appropriate way of modelling the spread of Ebola. In order to estimate the spread of Ebola I define a simple model with a constant and a linear relationship between days since the outbreak was announced and the log of the measure we are concerned with:
$$log(Y)=\alpha+\beta_1 Day$$
And estimate the model using weights to weight the data based on the number of days into the survey so that more recent observations are considered more important. I also discard the observations for the first 21 days because we can expect the preliminary data at that time was less accurate. Using the above model gives:
While intercept estimates are generally considered to be less important the coefficients on Day can be directly interpreted as percent changes by day. Thus we can expect from the current data that each day we will have a little over 2% additional suspected cases, deaths, and laboratory confirmations.
In order allow for the model to be a little more flexible in my projections I include a slightly more complex model including a squared term for the days since announcement.
$$log(Y)=\alpha+\beta_1 Day+\beta_2 Day^2$$
I use this model to project suspected cases, deaths, and laboratory results for the next three weeks. The values up until today show the comparison between the expected values estimated from the model (EDeaths, ESusp, and ELab) with that from the data (Death, Susp, and Lab). We can see the model fits the data quite well with all estimates within 100 of the observed while most are much closer. Using this model we can see that the total number of deaths is expected to be around 3,500 by the 24th and 7,200 suspected cases. Things just get worse from there.
date day Deaths EDeaths Susp ESusp Lab ELab
1 2014-03-25 1 59 89 86 140 0 49
2 2014-03-26 2 60 90 86 141 1 50
3 2014-03-27 3 66 90 103 143 4 51
7 2014-03-31 7 70 94 112 149 24 56
8 2014-04-01 8 80 95 122 151 24 57
9 2014-04-02 9 83 97 127 152 35 59
14 2014-04-07 14 95 102 151 161 52 66
17 2014-04-10 17 101 106 157 167 66 71
24 2014-04-17 24 122 115 197 182 101 83
28 2014-04-21 28 129 122 203 192 109 91
30 2014-04-23 30 136 125 208 197 112 95
37 2014-04-30 37 146 137 221 218 126 112
42 2014-05-05 42 155 148 231 235 127 126
51 2014-05-14 51 157 169 233 270 129 155
60 2014-05-23 60 174 196 258 315 146 190
65 2014-05-28 65 191 213 290 344 170 214
70 2014-06-02 70 199 232 341 377 186 240
73 2014-06-05 73 222 245 425 399 238 257
78 2014-06-10 78 244 269 462 440 253 289
79 2014-06-11 79 261 274 494 449 277 296
86 2014-06-18 86 337 313 528 517 364 348
92 2014-06-24 92 338 353 599 587 441 399
100 2014-07-02 100 467 416 759 700 544 481
105 2014-07-07 105 481 462 779 784 557 540
106 2014-07-08 106 518 472 844 803 626 552
112 2014-07-14 112 539 539 888 925 664 634
114 2014-07-16 114 601 564 964 971 706 665
122 2014-07-24 122 632 677 1048 1183 745 800
126 2014-07-28 126 672 744 1201 1311 814 877
129 2014-07-31 129 728 800 1323 1417 909 941
132 2014-08-03 132 826 860 1439 1533 953 1008
133 2014-08-04 133 887 882 1603 1574 1009 1032
137 2014-08-08 137 961 974 1779 1753 1134 1132
141 2014-08-12 141 1013 1077 1848 1956 1176 1242
142 2014-08-13 142 1069 1105 1975 2011 1251 1271
144 2014-08-15 144 1145 1163 2127 2127 1310 1332
148 2014-08-19 148 1229 1290 2240 2381 1383 1461
150 2014-08-21 150 1350 1360 2473 2522 1460 1530
151 2014-08-22 151 1427 1397 2561 2596 1528 1566
157 2014-08-28 157 1552 1641 3069 3094 1752 1800
158 2014-08-29 158 1686 3188 1842
159 2014-08-30 159 1733 3284 1885
160 2014-08-31 160 1841 1782 3685 3384 1930 Update 9/4
161 2014-09-01 161 1831 3488 1975
162 2014-09-02 162 1883 3595 2021
163 2014-09-03 163 1936 3705 2069
164 2014-09-04 164 1991 3820 2117
165 2014-09-05 165 2097 2047 3944 3939 2167 Updated 9/11
166 2014-09-06 166 2106 4062 2218
167 2014-09-07 167 2166 4189 2270
168 2014-09-08 168 2228 4321 2323
169 2014-09-09 169 2292 4457 2378
170 2014-09-10 170 2359 4599 2433
171 2014-09-11 171 2427 4745 2491
172 2014-09-12 172 2498 4897 2549
173 2014-09-13 173 2572 5055 2609
174 2014-09-14 174 2630 2647 5347 5218 3095 2670 Updated 9/18
175 2014-09-15 175 2725 5386 2733
176 2014-09-16 176 2806 5562 2797
177 2014-09-17 177 2890 5743 2863
178 2014-09-18 178 2976 5931 2930
179 2014-09-19 179 3065 6126 2998
180 2014-09-20 180 3157 6329 3069
181 2014-09-21 181 2917 3253 6263 6539 3487 3141 Updated 9/25
182 2014-09-22 182 3351 6756 3215
183 2014-09-23 183 3453 6982 3290
184 2014-09-24 184 3559 7217 3367
...
2014-09-30 190 3441 7470 4087 Updated 10/3
...
2014-10-03 193 3857 8011 4440 Updated 10/8
...
2014-10-08 198 4033 8400 4656 Updated 10/10
Falseness of my Model
This model by definition cannot be true globally (into the distant future). This is obvious when we use the model to project out to one year. At one year the number of infected cases is estimated as 436 billion. Since the entire population of the Earth is only 8 billion or so we know that this cannot be true.
However, this kind of model can be a good approximation locally (in the near future). If it is a good approximation locally then the next WHO report is going to list around 2100 deaths and 4060 suspected cases as of today.
So, I ask the question, "is 1.2 million deaths a projection which is either local or global?" I cannot answer this, but it certainly is within the realm of feasibility since the nation of Liberia alone has over 4 million people and Guinea 10 million and Sierra Leone 6 million. The real question becomes, "do we think the ability of Liberia and other afflicted nations to control the spread of Ebola will increase, decrease, or remain the same over time?"
From Figure 7 we can see that Liberia is significantly behind other nations in its ability to diagnose Ebola. This and the well known lack of medical facilities suggests to me that as the crisis escalates the ability of Liberia to maintain any sense of order and with it any hope of controlling the spread of the disease is likely to degrade. If this is the case then it is quite possible that even this horrifying projection is an underestimate of the pain and carnage likely to result from this outbreak.
What to Do?
News reports and the governments they are reporting on seem to have been placing a good deal of emphasis on investing in vaccines and treatment options. However, while all of these options are good, they are long term options (6 to 12 months). In the meantime, every resource available must be used to contain and restrict the spread of this outbreak.
It is extremely foolish to think that any nation is immune to this disease. So far in the entire history of Ebola outbreaks up until the present less than 10 thousand people have been infected. This relatively low infection count coupled with rapid mortality makes it unlikely that the disease will significantly mutate among the human population.
However, if my projections are anywhere close to accurate then the number of infected people are going to be much higher than has ever occurred previously. This will create many more habitats for which the virus can possible mutate new traits which could increase its transmission rate. These mutations could take the form of longer gestation periods which might lead to a greater time between being infectious and being detectable.
Another possible trait might be the ability to go airborne which would significantly increase its ability ability to be transmitted. Some scientists it very unlikely to become airborne because it is too heavy. This may be the case. However, as the possibility of it becoming airborne could result in a global spread of the disease resulting in unprecedented number of deaths world wide it is more than prudent to heavily invest in controlling the number of new patients infected by this disease.
In addition, even if the disease does not mutate from the state that it is in currently to a new one, it has shown itself to be extremely effective at being transmitted with a large number of health workers becoming infected and dying from the disease. These health workers should have known how to control the spread of the disease and prevent infection. Do we really expect that if the disease were to enter any other nation on Earth that the general population is going to be better prepared to protect themselves than the specialists who have already fallen victim to this disease?
Thus, it is imperative that we do everything within our power to control the spread of this terrible disease. Even if my model only has a ten percent chance of being accurate over the next six months, we would be extremely foolish to risk not responding to this outbreak with every resource within reason humanity can muster.
Figure 1: The lines are projected values while the points are data points. |
The World Health Organization released a statement on August 28th that the epidemic could afflict more than 20,000 people before it could be brought under control. This however, assumes full international backing for an intervention to control the deadly outbreak. Failure to fully support the WHO's plan presumably would cause the disease to continue to spread in a similar manner as it already has.
At first a figure as high as 20,000 seems exaggerated especially when looking just at the number of 3,000 cases reported the same day as the announcement. However, I believe that this estimate is vastly too small and is entirely based on an effective and well funded international relief mission. Using a projection from all of the WHO reports to date I calculate that if the disease continues to spread at the rate it currently is then we will have more than 20,000 cases by October 24.
The report also states that it will likely take six to nine months in order to stop the epidemic. However if nothing changes and the epidemic continues to rage as it currently does then my projections estimate that as many as 4.7 million people will have been infected and 1.2 million will have already died.
These are extremely dire predictions and I hope that they are purely extrapolations based on what will later be seen as the scariest month of this epidemic. However, the exponential growth model fits the data very well and at least in the short term should be expected to be fairly accurate.
All of this analysis is done in R and the code can be found on Github.
From 41 CDC Ebola reports I have assembled a small database of cases by country listing the number of 'Suspected or Confirmed Cases', the number of 'Deaths' suspected to be associated with Ebola, and the number of 'Laboratory Confirmed' cases of Ebola. You can find and add to this database as a google spreadsheet here. If running the code for yourself it will import the spreadsheet data directly.
Mapping this data by country and fitting a local polynomial regression to give a fitted line for each country gives us some signs of a very disturbing trend. The country in which the current outbreak originated is Guinea and though the disease continues to claim new victims it is much less worrisome compared with Sierra Leone and Liberia where rates of suspected cases and numbers of deaths are exponentially growing.
Figure 2: The increase of deaths in Liberia is much steeper than the other two heavily afflicted countries of Guinea and Sierra Leone. |
Figure 4: The increase of deaths in Liberia is much steeper than the other two heavily afflicted countries of Guinea and Sierra Leone. |
Figure 5: The total number of cases is rising extremely quickly. |
From this graph it is clear that a direct linear model cannot fit well at all. Suspecting that the change over time might fit an exponential growth function, I take the natural log of the values mapped above.
Figure 6: A log transformation of the total number of cases creates a relatively linear relationship between time and number of cases reported. |
This new transformed graph demonstrates an extremely distributing confirmation that using an exponential growth model would be an appropriate way of modelling the spread of Ebola. In order to estimate the spread of Ebola I define a simple model with a constant and a linear relationship between days since the outbreak was announced and the log of the measure we are concerned with:
$$log(Y)=\alpha+\beta_1 Day$$
And estimate the model using weights to weight the data based on the number of days into the survey so that more recent observations are considered more important. I also discard the observations for the first 21 days because we can expect the preliminary data at that time was less accurate. Using the above model gives:
Intercept Day Suspected 4.38881946 0.02245505 Deaths 4.00491144 0.02096758 Laboratory 3.86052949 0.02314866
While intercept estimates are generally considered to be less important the coefficients on Day can be directly interpreted as percent changes by day. Thus we can expect from the current data that each day we will have a little over 2% additional suspected cases, deaths, and laboratory confirmations.
In order allow for the model to be a little more flexible in my projections I include a slightly more complex model including a squared term for the days since announcement.
$$log(Y)=\alpha+\beta_1 Day+\beta_2 Day^2$$
I use this model to project suspected cases, deaths, and laboratory results for the next three weeks. The values up until today show the comparison between the expected values estimated from the model (EDeaths, ESusp, and ELab) with that from the data (Death, Susp, and Lab). We can see the model fits the data quite well with all estimates within 100 of the observed while most are much closer. Using this model we can see that the total number of deaths is expected to be around 3,500 by the 24th and 7,200 suspected cases. Things just get worse from there.
date day Deaths EDeaths Susp ESusp Lab ELab
1 2014-03-25 1 59 89 86 140 0 49
2 2014-03-26 2 60 90 86 141 1 50
3 2014-03-27 3 66 90 103 143 4 51
7 2014-03-31 7 70 94 112 149 24 56
8 2014-04-01 8 80 95 122 151 24 57
9 2014-04-02 9 83 97 127 152 35 59
14 2014-04-07 14 95 102 151 161 52 66
17 2014-04-10 17 101 106 157 167 66 71
24 2014-04-17 24 122 115 197 182 101 83
28 2014-04-21 28 129 122 203 192 109 91
30 2014-04-23 30 136 125 208 197 112 95
37 2014-04-30 37 146 137 221 218 126 112
42 2014-05-05 42 155 148 231 235 127 126
51 2014-05-14 51 157 169 233 270 129 155
60 2014-05-23 60 174 196 258 315 146 190
65 2014-05-28 65 191 213 290 344 170 214
70 2014-06-02 70 199 232 341 377 186 240
73 2014-06-05 73 222 245 425 399 238 257
78 2014-06-10 78 244 269 462 440 253 289
79 2014-06-11 79 261 274 494 449 277 296
86 2014-06-18 86 337 313 528 517 364 348
92 2014-06-24 92 338 353 599 587 441 399
100 2014-07-02 100 467 416 759 700 544 481
105 2014-07-07 105 481 462 779 784 557 540
106 2014-07-08 106 518 472 844 803 626 552
112 2014-07-14 112 539 539 888 925 664 634
114 2014-07-16 114 601 564 964 971 706 665
122 2014-07-24 122 632 677 1048 1183 745 800
126 2014-07-28 126 672 744 1201 1311 814 877
129 2014-07-31 129 728 800 1323 1417 909 941
132 2014-08-03 132 826 860 1439 1533 953 1008
133 2014-08-04 133 887 882 1603 1574 1009 1032
137 2014-08-08 137 961 974 1779 1753 1134 1132
141 2014-08-12 141 1013 1077 1848 1956 1176 1242
142 2014-08-13 142 1069 1105 1975 2011 1251 1271
144 2014-08-15 144 1145 1163 2127 2127 1310 1332
148 2014-08-19 148 1229 1290 2240 2381 1383 1461
150 2014-08-21 150 1350 1360 2473 2522 1460 1530
151 2014-08-22 151 1427 1397 2561 2596 1528 1566
157 2014-08-28 157 1552 1641 3069 3094 1752 1800
158 2014-08-29 158 1686 3188 1842
159 2014-08-30 159 1733 3284 1885
160 2014-08-31 160 1841 1782 3685 3384 1930 Update 9/4
161 2014-09-01 161 1831 3488 1975
162 2014-09-02 162 1883 3595 2021
163 2014-09-03 163 1936 3705 2069
164 2014-09-04 164 1991 3820 2117
165 2014-09-05 165 2097 2047 3944 3939 2167 Updated 9/11
166 2014-09-06 166 2106 4062 2218
167 2014-09-07 167 2166 4189 2270
168 2014-09-08 168 2228 4321 2323
169 2014-09-09 169 2292 4457 2378
170 2014-09-10 170 2359 4599 2433
171 2014-09-11 171 2427 4745 2491
172 2014-09-12 172 2498 4897 2549
173 2014-09-13 173 2572 5055 2609
174 2014-09-14 174 2630 2647 5347 5218 3095 2670 Updated 9/18
175 2014-09-15 175 2725 5386 2733
176 2014-09-16 176 2806 5562 2797
177 2014-09-17 177 2890 5743 2863
178 2014-09-18 178 2976 5931 2930
179 2014-09-19 179 3065 6126 2998
180 2014-09-20 180 3157 6329 3069
181 2014-09-21 181 2917 3253 6263 6539 3487 3141 Updated 9/25
182 2014-09-22 182 3351 6756 3215
183 2014-09-23 183 3453 6982 3290
184 2014-09-24 184 3559 7217 3367
...
2014-09-30 190 3441 7470 4087 Updated 10/3
...
2014-10-03 193 3857 8011 4440 Updated 10/8
...
2014-10-08 198 4033 8400 4656 Updated 10/10
Falseness of my Model
This model by definition cannot be true globally (into the distant future). This is obvious when we use the model to project out to one year. At one year the number of infected cases is estimated as 436 billion. Since the entire population of the Earth is only 8 billion or so we know that this cannot be true.
However, this kind of model can be a good approximation locally (in the near future). If it is a good approximation locally then the next WHO report is going to list around 2100 deaths and 4060 suspected cases as of today.
So, I ask the question, "is 1.2 million deaths a projection which is either local or global?" I cannot answer this, but it certainly is within the realm of feasibility since the nation of Liberia alone has over 4 million people and Guinea 10 million and Sierra Leone 6 million. The real question becomes, "do we think the ability of Liberia and other afflicted nations to control the spread of Ebola will increase, decrease, or remain the same over time?"
From Figure 7 we can see that Liberia is significantly behind other nations in its ability to diagnose Ebola. This and the well known lack of medical facilities suggests to me that as the crisis escalates the ability of Liberia to maintain any sense of order and with it any hope of controlling the spread of the disease is likely to degrade. If this is the case then it is quite possible that even this horrifying projection is an underestimate of the pain and carnage likely to result from this outbreak.
What to Do?
News reports and the governments they are reporting on seem to have been placing a good deal of emphasis on investing in vaccines and treatment options. However, while all of these options are good, they are long term options (6 to 12 months). In the meantime, every resource available must be used to contain and restrict the spread of this outbreak.
It is extremely foolish to think that any nation is immune to this disease. So far in the entire history of Ebola outbreaks up until the present less than 10 thousand people have been infected. This relatively low infection count coupled with rapid mortality makes it unlikely that the disease will significantly mutate among the human population.
However, if my projections are anywhere close to accurate then the number of infected people are going to be much higher than has ever occurred previously. This will create many more habitats for which the virus can possible mutate new traits which could increase its transmission rate. These mutations could take the form of longer gestation periods which might lead to a greater time between being infectious and being detectable.
Another possible trait might be the ability to go airborne which would significantly increase its ability ability to be transmitted. Some scientists it very unlikely to become airborne because it is too heavy. This may be the case. However, as the possibility of it becoming airborne could result in a global spread of the disease resulting in unprecedented number of deaths world wide it is more than prudent to heavily invest in controlling the number of new patients infected by this disease.
In addition, even if the disease does not mutate from the state that it is in currently to a new one, it has shown itself to be extremely effective at being transmitted with a large number of health workers becoming infected and dying from the disease. These health workers should have known how to control the spread of the disease and prevent infection. Do we really expect that if the disease were to enter any other nation on Earth that the general population is going to be better prepared to protect themselves than the specialists who have already fallen victim to this disease?
Thus, it is imperative that we do everything within our power to control the spread of this terrible disease. Even if my model only has a ten percent chance of being accurate over the next six months, we would be extremely foolish to risk not responding to this outbreak with every resource within reason humanity can muster.
I hate to say this but if it appears that things start to spike, as your graphs indicate, the only hope to stop the spread will be cautery, :(
ReplyDeleteThanks for your time to do this analysis. Some data could not be related with real date, for example WHO reported 2296 deaths as of 6 sept not 9 sept in your chart. Maybe this change the model?
ReplyDeleteThanks for your comment Byran. I had difficulty finding the original source of the WHO numbers so I ended up deciding to just drop data points I could not source. Now that there are updated reports that I can directly reference I am happy referencing those. By the way, this does not change my model since the post was written with data only up until August 28th. I will post something again soon with an updated model.
DeleteThanks Francis, this is very interesting work.
DeleteNow that we have a few more weeks of data, how is the model from mid sept holding up?
Thank you for a great post! Please keep updating it!
ReplyDeleteThank you for this analysis. I am trying to run the code in RStudio and get an error message at this line:
ReplyDelete# I have borrowed Andrie's code from stackoverflow
# http://goo.gl/noYVo7
source("http://goo.gl/w64gfp")
The message is:
Error in file(filename, "r", encoding = encoding) :
cannot open the connection.
When I open http://goo.gl/w64gfp in Google Chrome it opens the list of functions as described.
source() seems to work on other webpages. I tried
source("http://bioconductor.org/biocLite.R") which opened normally.
As a work around, I downloaded the CSV and modified the code and used the Excel base date:
ebola <- read.csv("Ebola - Countries.csv", header=TRUE, sep = ",") # %>% as.data.frame
ebola$Date <- as.Date(ebola$Date, origin = "1899-12-30")
ebola$Day <- ebola$Date-min(ebola$Date)
But I am interested to know why the "source("http://goo.gl/w64gfp")" does not work?
Thank you
Modeling is useful in many cases. However, this outbreak is primarily due to cultural traditions. Thus, human behavior is something that models are not able to predict. There have been 24 outbreaks of ebola since 1976, this data would also be useful to consider when making a model, since those outbreaks took place in other countries that do not have the same traditions, thus a more accurate representative of infection rates.
ReplyDeleteAlso, as you can now see the high death rates are likely caused by the lack of medical facilities. Those treated in modern facilities are having much higher survival rates.
So these are other factors that must be taken into consideration when creating a mathematical model.
I appreciate your feedback, but I am confused at to what you are suggesting. Including other factors in the model necessitates the development of an alternative model. In this case, the 24 previous outbreaks were both much smaller in scale and much more localized, resulting outbreaks that were containable given the resources available.
DeleteOverall, the point of this model is not to explain every Ebola epidemic that has ever occurred but to provide simple estimates of how the epidemic might develop. Ultimately we are constrained by our data. It is not possible to include every possible factor that could explain how this epidemic might have occurred and spread since the data is extremely limited.