As Covid-19 becomes a worldwide pandemic, more and more people and organizations are using Mathematical Modelling to predict the trend of Covid-19. News such as the one below can be seen from time to time:
Figure 1: Media reported that math models could be used to predict the outbreak and cope with the spread of COVID-19.
It can be seen that math models can help governments predict the outbreak and cope with the spread of COVID-19. The knowledge of Mathematical modelling is in IB Math textbooks. Some IB students in Gabe Math have already started to research the interesting topics for writing the IB Math IA (note: IB Math internal assessment takes up 20% of the final IB math score). They are wondering whether it is possible to write an IA about COVID-19 using math modelling.
Therefore, in order to help these students, Gabe Math decided to write this article.
Figure 2: The contents in IB Math textbook about the math models.
In IB textbook, we can find the math models such as: linear model, polynomial model, exponential model, logarithm model and trigonometry model, etc. Among all these math models above, the one that is used most commonly to model the total number of cases over time of Covid-19 is the exponential model.
Why it’s exponential model? To answer this question, one concept that is called Basic Reproduction Number must be introduced. Basic Reproduction Number, which can be represented by R0, is essentially the average number of successful offspring that a parasite is intrinsically capable of producing and R0 can be defined more precisely as average number of secondary infections produced when one infected individual is introduced into a host population where everyone is susceptible<1>. In brief, R0 means the average number of people an infector can infect. Most R0 of infections should be greater than 1. Otherwise, the disease would be eradicated before becoming an actual epidemic.
Figure 3: The news about the exponential growth of COVID-19
According to some relevant researches, R0 of COVID-19 is around 2.2 (1.4-3.9, 95% confidence interval) <2>.
For example, in the beginning, if 10 people are infected with COVID-19, and nothing is done to stop the transmission of the virus carried by those infectors, then the carriers will spread the virus to other people. As a result, at the end of the day, there would be 3.14 new cases, and 13.14 total cases; At the end of the second day, there would be 13.14 new cases and 17.27 total cases; If this continues, then until the end of the first week (7 days), there would be 67.6 total cases.
The general exponential model describing the relation between the time (day, d) and total cases (N) could hence be deduced as follows:
Some students might be wondering in the example above, after the whole week, there are only around 70 cases in total, why currently in so many countries more than 10,000 Covid-19 cases are found. It is because if the time (d) increases to one month (30 days), then 30 is substituted into the general exponential model N=10×(1+0.314)30=36129. Total cases at the end of the first month is therefore more than 500 times the total cases at the end of first week. This result shows how astonishing and scary the exponential model can be when it is used to model the spread of a virus.
To illustrate the exponential model more specifically, the data of the total cases in the U.S. is used to build the model which is collected from February 25th to March 31st in the U.S. (data can be seen at the end of this article and February 25th is assumed as the first day which means d=1 on that day). The relation between time (day) and total cases (N) in the U.S. can be assumed as an exponential model N=N0×(1+r)d. Using the knowledge of IB math, N=N0×(1+r)d can be converted into a linear equation and the parameters therefore in the linear regression can be calculated:
Figure 4: The scatter plot (blue) and exponential regression model (purple) of total cases in the U.S. over time from February 25th to March 31st.
This exponential model can be verified. After converting the exponential equation into a linear equation, Pearson correlation coefficient can be calculated as R=0.997 and the coefficient of determination R2 =0.994 also can be calculated. R2 is very close to 1 meaning a high goodness of fit which indicates that this exponential model is suitable for the data. In addition, the bivariate hypothesis test gives more evidence of showing the validity of the exponential model.
The detailed steps of calculating the Pearson correlation coefficient, the coefficient of determination and conducting the bivariate hypothesis test will be taught in Gabe math IA lessons.
According to the model above, after one month, there were more than 100,000 total cases while it only began with 10 total cases. The rate of change increased rapidly, but how to reduce this rate? To answer this question, let’s focus on the exponential model N=N0×(1+r)d and r=k×β. To reduce the rate of increase of total cases, the critical step is to reduce the value of r. If the r value in the exponential model of the U.S. could be reduced slightly from 0.34 to 0.31 and the value of N0 could remain, then there would be 64,932 total cases in the U.S. by March 28th which is approximately half of the real total cases in the U.S. by March 28th (121,105).
r=k×β, where k is the average number of people someone infected is exposed to; β is the probability of each exposure becoming an infection. If the value of k or the value of β can be reduced, then the r value will decrease. It means the total cases will increase more slowly. Some of the methods to reduce the value of k are such as quarantine and social distancing of which we』ve heard a lot recently. Quarantine reduces the chance that the infectors are exposed to the susceptible population and social distancing urges people to avoid crowded places. This can help to reduce the risks of community acquired infection.
Figure 5: The scatter plot (blue) and exponential regression model (orange) of total cases in Chinese Mainland from January16th to January 29th.
By comparison, the data of total cases in Chinese Mainland was collected (data can be seen at the end of this article and January 16th is assumed as the first day which means d=1 on that day). An exponential function is used to model the data of the total cases in Chinese Mainland from January 16th to January 29th. The function can be found by either technology or manually. After the modelling function is obtained, the coefficient of determination can be calculated as well, R2=0.994, which is very close to 1. It means the exponential model is a good fit for the data.
According to this equation, the total cases in Chinese Mainland would have been more than 1,000,000 by February 10th. However, in Chinese Mainland, quarantine, social distancing and other measures were taken very efficiently and strictly. Therefore, Covid-19 was well controlled. The number of the total cases in Chinese Mainland was 81,154 by March 31st which is much lower than the predicted data given by the exponential model. It can also be manifested by the coefficient of determination R2 of the exponential model. The value of R2 is only 0.591. The reality and the R2 both indicate the exponential model is not a good fit for the data in Figure 6.
Figure 6: The scatter plot (blue), exponential regression model (red) and logistic regression model (green) of the total cases in Chinese Mainland from January 16thto March 31st. (Logistic growth obviously is a better model for the scatter plot above than exponential growth)
Hence, to describe the relation between the time (day) and the number of total cases in Chinese Mainland from January 16th to March 31st, a new math model must be used. The scatter plot in Figure 6 looks very alike the curve of a logistic model, and in addition, a point of inflection can be seen on the curve. The process to deduce and verify the validity of the logistic model will be taught in the Gabe Math IA lessons.
Both exponential regression model and logistic regression model shown above can be found in IB Math. However, they are basic math models. When math model is applied on the researches of the spread of a disease, a lot of variables need to be considered. Here is the brief introduction of some math models that are most commonly used in the researches of the spread of a disease.
The SI model is one of the simplest compartmental models, and many other models are derivatives of this basic form. In the SI model, S stands for the number of susceptible and I stands for the number of infections. The SI model applies for the researches of some refractory infections such as HIV.
The SIR model consists of three compartments: S for the number of susceptible, I for the number of infections, and R for the number of recovered or deceased (or immune) individuals. This model is used to research the infections where the recovered people are immune to the same infections such as measles.
SIS model also consists of three compartments: the first S for the number of susceptible, I for the number of infections, and the second S for the number of people who recovered but still susceptible to the infection (not immune to the infection). This model is used to research the infections where the recovered people are not immune to the same infection such as malaria.
SEIR model consists of four compartments: S for the number of susceptible, E for the number of exposed, I for the number of infectious, and R for the number of recovered or deceased (or immune) individuals. This model is used to research the infections which have an incubation period such as H1N1.
All the four models displayed above are compartmental models which are widely used in epidemiology. These models can be built with some software, for example, SPSS and MATLAB. How to make the use of these software for the math models and IA will all be introduced in Gabe math lesson IA lessons.
In 2018, one student in Gabe Math wrote an IB math IA about the spread of Ebola using SEIR model and got a very high score. The students who choose math modelling as the topic of their math IAs can feel confident because Gabe Math is very experienced in guiding such topics.
In the previous contents, we explained how the IB Math IA can be composed using math modelling from the perspective of Covid-19. The contents involved are not simple. Other than the technology mentioned above such as SPSS and MATLAB, the math knowledge is sophisticated as well. Regarding MATLAB, both ordinary differential equation and partial differential equation are required; The statistics knowledge involved is not only bivariate hypothesis test, the calculation of confiendence interval is needed for estimating the value of R0; In addition, in the process of deducing differential equation, Jacobi Matrix is also involved.
All the contents about the technology and math displayed above will be taught and gone through in Gabe Math IA lessons. The IB students who are interested in math modelling can scan the QR following code below or search 17091912892 to add Gabe Math on WeChat.
Appendix
Figure 7: Daily total cases in Chinese Mainland from January16th to March 31st
Figure 8: Daily total cases from February 25th to March 31st in the U.S.
References
1. Roy M. Anderson, Robert M. May, Infectious Diseases of Humans: Dynamics and Control, Oxford University Press, Oxford, 1992
2. Li Q, Guan X, Wu P, Wang X, Zhou L, Tong Y, et al. (January 2020). "Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia". The New England Journal of Medicine.)
本文為嘉博數學原創,歡迎轉載,但必須註明出處。
Copyright © 2020 Gabe Math. All Rights Reserved.
嘉博數學(Gabe Math)微信:17091912892