arima模型 白噪聲檢驗
White noise are variations in your data that cannot be explained by any regression model.
白噪聲是數據中的變化,任何回歸模型都無法解釋。
And yet, there happens to be a statistical model for white noise. It goes like this for time series data:
然而,碰巧有一個白噪聲統計模型。 時間序列數據如下所示:
The additive white noise model 加性白噪聲模型The observed value Y_i at time step i is the sum of the current level L_i and a random component N_i around the current level.
在時間步長i處的觀測值Y_i是當前水平L_i與當前水平附近的隨機分量N_i之和。
If the extent of random variation is proportional to the current level, then we have the following multiplicative version of the same model:
如果隨機變化的程度與當前水平成正比,那麼我們可以得到同一模型的以下乘性形式:
The multiplicative white noise model 乘法白噪聲模型If the current level L_i is constant for all i, i.e. L_i = L for all i, then the noise will be seen to fluctuate around a fixed level.
如果當前水平L_i對於所有i都是恆定的,即L_i = L對於所有i ,則將看到噪聲圍繞固定水平波動。
It’s easy to generate a white noise data set. Here’s how to do it in Excel:
生成白噪聲數據集很容易。 這是在Excel中執行的方法:
How to generate an additive white noise data set in Excel 如何在Excel中生成加性白噪聲數據集And here is the output plot of noise that is fluctuating around a constant level of 100:
這是噪聲的輸出圖,它在100的恆定水平附近波動:
Additive white noise around level=100 附加白噪聲級= 100The current level L_i often changes in response to real world factors. For example, if L_i changes linearly in response to a set of regression variables X, then we get the following linear regression model:
當前水平L_i經常響應於現實世界因素而改變。 例如,如果L_i響應一組回歸變量X線性變化,那麼我們得到以下線性回歸模型:
Time series with regression variables plus noise 具有回歸變量和噪聲的時間序列In the above equation, β is the vector of regression coefficients and X_i is a vector of regression variables.
在上式中, β是回歸係數的向量, X_i是回歸變量的向量。
There are three reasons why:
原因有以下三個:
If you discover using some techniques which I will describe soon, that your data is basically white noise around a fixed level, then the best that you can do is fit a model around that fixed level. It will be a waste of time to try to do anything better than that. 如果您使用我將很快描述的一些技術發現,您的數據基本上是固定水平附近的白噪聲,那麼您可以做的最好的事情就是將模型固定在該水平附近。 嘗試做任何比這更好的事情都是浪費時間。 Suppose you have already fitted a regression model to a data set. If you are able to show that the residual errors of the fitted model are white noise, it means your model has done a great job of explaining the variance in the dependent variable. There is nothing left to extract in the way of information and whatever is left is noise. You can pat yourself on the back for a job well done! 假設您已經對數據集擬合了回歸模型。 如果您能夠證明擬合模型的殘留誤差是白噪聲,則表明您的模型在解釋因變量的方差方面做得很好。 沒有什麼可以提取信息的方式了,剩下的就是噪音。 您可以輕拍自己的背,以完成出色的工作! Thirdly, the white noise model happens to be a stepping stone to another important and famous model in statistics called the Random Walk model which I will explain in the next section. 第三,白噪聲模型恰好是統計學中另一個重要且著名的模型(稱為隨機遊走模型)的墊腳石,我將在下一部分中進行解釋。Let’s again look at the White Noise Model’s equation:
讓我們再次看一下白噪聲模型的方程:
If we make the level level L_i at time step i be the output value of the model from the previous time step (i-1), we get the Random Walk model, made famous in the popular literature by Burton Malkiel’s A Random Walk Down Wall Street.
如果我們將時間步長i處的水平L_i設為上一個時間步長(i-1)的模型的輸出值, 則會得到隨機遊走模型,該模型在伯頓·馬爾基爾(Burton Malkiel)的《隨機遊走的牆壁》一書中廣受歡迎街 。
The Random Walk Model 隨機遊走模型The Random Walk model is like the mirage of the Data Science dessert. It has lured many profit-thirsty investors into betting (and losing) their shirt on illusions of trends in stock price movements, movements that were in reality little more than a random walk.
隨機遊走模型就像數據科學甜點的海市rage樓。 它已經吸引了許多渴求利潤的投資者,將其押注(輸掉)他們的襯衫,以幻想股價走勢的錯覺,實際上,這些走勢只是隨意走動而已。
Here’s a plot of data that was generated using the Random Walk model:
這是使用隨機遊走模型生成的數據圖:
A Random Walk 隨機漫步Just tell me you don’t see any trends in this plot!
告訴我,您在該圖中看不到任何趨勢!
If you are not completely convinced that the above data can be generated by a purely random process, let’s puff away any remaining illusions by showing how to generate this data in Excel:
如果您不完全相信上面的數據可以通過純隨機過程生成,那麼讓我們通過展示如何在Excel中生成此數據來消除任何剩餘的幻想:
How to generate Random Walk data in Excel 如何在Excel中生成隨機遊動數據Let’s look at how we can make use of our knowledge of white noise and random walks to try to detect their presence in time series data.
讓我們看看如何利用我們對白噪聲和隨機遊走的知識來嘗試檢測時間序列數據中它們的存在。
We』ll look at 3 tests to determine whether your time series is in reality, just white noise:
我們將通過3種測試來確定您的時間序列是否真實,只是白噪聲:
Auto-correlation plots 自相關圖 The Box-Pierce test Box-Pierce檢驗 The Ljung-Box test Ljung-Box測試 使用自相關圖測試白噪聲 (Testing for white noise using auto-correlation plots)When two variables move up or down in unison (or if one value goes up, the other one goes down), they are said to be positively (or negatively) correlated. The correlation coefficient can be used to measure the degree of linear correlation between two such variables:
當兩個變量一致地上下移動(或者一個值上升時,另一個變量下降)時,它們被認為是正相關的(或負相關的)。 相關係數可用於測量兩個此類變量之間的線性相關程度:
X and X和YY之間的線性相關In the above formula, E(X) and E(Y) are the expected (i.e. mean) values of X and Y. σ_X and σ_Y are the standard deviations of X and Y.
在上式中, E( X )和E( Y )是X和Y的預期(即平均值)值。 σ_X和σ_Y是X和Y的標準偏差。
In time series data, correlations often exist between the current value and values that are 1 time step or more older than the current value, i.e. between Y_i and Y_(i-1), between Y_i and Y_(i-2) and so on. Stock price changes often show such patterns of positive and negative correlations (and beware, so do data containing random walks!).
在時間序列數據中,當前值和比當前值早1個時間步或更早的值之間通常存在相關性,即Y_i和Y_(i-1)之間, Y_i和Y_(i-2)之間等等。 。 股票價格變化通常顯示出正相關和負相關的模式(請注意,包含隨機遊走的數據也是如此!)。
StockCharts.com under 使用條款,圖表由terms of useStockCharts.com提供Because the values are correlated with past versions of themselves, we call them auto, meaning self correlated.
由於這些值與自身的過去版本相關,因此我們將其稱為「自動」,即自相關。
Here is the formula for calculating the auto-correlation coefficient between Y_i and Y_(i-k):
這是用於計算Y_i和Y_(ik)之間的自相關係數的公式:
Auto-correlation coefficient at lag k 滯後k的自相關係數Before we can show how this auto-correlation coefficient r_k can be used to detect white noise, we need to take a short and pleasant side-trip into the land of random variables. I』ll explain why r_k is a normally distributed random variable and how this property of r_k can be used to detect white noise.
在我們展示如何使用該自相關係數r_k來檢測白噪聲之前,我們需要對隨機變量進行短暫而愉快的旁通 。 我將解釋為什麼r_k是正態分布的隨機變量,以及r_k的此屬性如何用於檢測白噪聲。
LAG-k自相關係數 r_k的分布 (Distribution of the LAG-k auto-correlation coefficient r_k)For any lag k, r_k is a normally distributed random variable with some mean µ_k and variance σ²_k.
任何滯後K,r_k是與一些均值μ_K和方差σ²_K正態分布的隨機變量。
To understand why, consider this thought experiment:
要了解原因,請考慮以下思想實驗:
Take a time series data set containing 100,000 time points. 取得包含100,000個時間點的時間序列數據集。Draw 5000 randomly selected samples from this data set. Suppose each sample is of length 100 continuous time points.
從該數據集中抽取5000個隨機選擇的樣本。 假設每個樣本的長度為100個連續時間點。
For each sample, calculate the LAG-1 auto-correlation coefficient r_1 using the above formula for r_k.
對於每個樣本,使用上述r_k公式計算LAG-1自相關係數r_1 。
One can see that each time, r_1 will come out to be some value between 0 and 1 for each sample of 100 time points. So we end up with 5000 values of r_1, each a number between 0 and 1. Thus r_1 is a random variable for which we have measured 5000 values.
可以看到,對於100個時間點的每個樣本, r_1每次都會得出介於0和1之間的某個值。 因此,我們得到r_1的5000個值,每個值在0到1之間。因此r_1是一個隨機變量,我們已經為它測量了5000個值。
By appealing to the Limit Theorems of statistics, it can be shown r_1 is a normally distributed random variable, and the distribution of r_1 is centered at some population mean, we』ll call it µ_1, and some variance, we』ll call it σ²_1. In practice, the observed mean and variance of r_1 will be somewhere close to the mean of the 5000 values of r_1 which we measured.
通過利用統計的極限定理, 可以證明r_1是正態分布的隨機變量,並且r_1的分布以某個總體平均值為中心,我們將其稱為µ_1,將某些方差稱為σ²_1 。 實際上,觀察到的r_1的均值和方差將接近我們測量的r_1的5000個值的均值。
By repeating the above experiment for all lags k, it can be shown that auto-correlation coefficients for all lags are normally distributed random variables with mean µ_k and variance σ²_k.
通過對所有滯後k重複上述實驗,可以證明所有滯後的自相關係數都是均值μ_k和方差σ²_k的 正態分布隨機變量 。
Symbolically:
象徵性地:
For all lags k, r_k is a normally distributed random variable 對於所有滯後k,r_k是正態分布的隨機變量 檢測白噪聲的含義 (Implications for detecting white noise)If the time series is white noise, then in theory, its current value T_i ought not be correlated at all with past values T_(i-1), T_(i-2) etc, and the corresponding auto-correlation coefficients r_1, r_2,…etc. will be zero or close to zero.
如果時間序列是白噪聲,那麼從理論上講,它的當前值T_i根本不應該與過去的值T_(i-1),T_(i-2)等以及相應的自相關係數r_1,r_2相關,…等將為零或接近零。
i.e.when the time series is white noise, r_k is 0 for all k = 1, 2, 3,…
即,當時間序列是白噪聲時,對於所有k = 1、2、3 ... , r_k為0
But we have just seen that r_k is a N(µ_k, σ²_k) random variable.
但是我們剛剛看到r_k是一個N(µ_k,σ²_k)隨機變量。
Putting the above two facts together, we arrive at the following first important implication:
綜合以上兩個事實,我們得出以下第一個重要含義:
If the time series is white noise, then the auto-correlation coefficient r_k for all lags k will have a zero mean and some variance σ²_k.
如果時間序列是白噪聲,則所有滯後k的自相關係數r_k將具有零均值和一些方差σ²_k。
Symbolically:
象徵性地:
For all lags k, r_k has zero mean under white noise conditions 對於所有滯後k,在白噪聲條件下r_k的均值為零But what about the variance σ²_k of the coefficients r_k?
但是關於係數r_k的方差σ²_k什麼?
Anderson, Bartlett and Quenouille have shown that under white noise conditions, the standard deviation σ_k is as follows:
Anderson , Bartlett和Quenouille證明,在白噪聲條件下,標準偏差σ_k如下:
σ_k = 1/sqrt(n)
σ_k= 1 /平方根(n)
Where n is the same size. Recollect that in our thought experiment, n was 100.
其中n是相同的大小。 回憶一下我們的思想實驗中, n為100。
Thus, we know that r_k under white noise conditions has the following distribution:
因此,我們知道白噪聲條件下的r_k具有以下分布:
Distribution of auto-correlation coefficients when the data set is pure white noise 數據集為純白噪聲時自相關係數的分布An important property of the normal distribution is that approximately 95% of it lies within 1.96 standard deviations from the mean. In our case, the mean is 0 and standard deviation is 1/sqrt(n), so we get the following 95% confidence interval for the auto-correlation coefficients:
正態分布的一個重要屬性是大約95%的分布在均值的1.96標準偏差之內。 在我們的情況下,平均值為0,標準偏差為1 / sqrt(n) ,因此對於自相關係數,我們得到以下95%的置信區間:
These results yield the following procedure for conducting the white noise test using the auto-correlation coefficients r_k:
這些結果得出以下使用自相關係數r_k進行白噪聲測試的過程 :
Calculate the first k auto-correlation coefficients r_k. k can be set to some high enough value depending on the length n of the time series data set.
計算前k個自相關係數r_k 。 可以將k設置為足夠高的值,具體取決於時間序列數據集的長度n 。
Calculate the 95% confidence interval [ — 1.96/sqrt(n), +1.96/sqrt(n)].
計算95%置信區間[-1.96 / sqrt(n),+ 1.96 / sqrt(n)]。
If for all k, if r_k lies within the above confidence interval, conclude at a 95% confidence level that the time series is in reality, possibly just white noise. We say possibly because if we experiment with larger sample sizes, i.e. larger n, the size of the confidence interval will shrink, and values of r_k that were previously inside the 95% bounds will now lie outside the 95% bounds.
如果對於所有k ,如果r_k都在上述置信區間內,則以95%的置信度推斷該時間序列實際上是現實的, 可能只是白噪聲。 我們之所以說是可能的,是因為如果我們嘗試使用更大的樣本量(即更大的n) ,則置信區間的大小將縮小,並且先前在95%範圍內的r_k值現在將在95%範圍之外。
If any of the r_k lie outside the confidence interval, then the time series possibly has information in it.
如果r_k中的任何一個位於置信區間之外,則時間序列中可能包含信息。
Let’s illustrate the above procedure using a real world time series of 5000 decibel level measurements taken at a restaurant using the Google Science Journal app.
讓我們通過使用Google Science Journal應用程式在餐廳進行的5000分貝水平的真實世界時間序列說明上述過程。
The data set can be downloaded from here.
數據集可從此處下載。
We』ll use the pandas library to load the data set from the csv file and plot it:
我們將使用pandas庫從csv文件加載數據集並進行繪製:
import pandas as pdimport numpy as npfrom matplotlib import pyplot as pltdf = pd.read_csv('restaurant_decibel_level.csv', header=0, index_col=[0])
Let’s print the top 10 rows:
讓我們列印前十行:
df.head(10) Decibel
TimeIndex
0 55.931323
40 57.779260
80 62.956952
140 65.158100
180 60.325242
220 45.411725
262 55.958807
300 62.021807
340 62.222563
380 56.156684
Let’s plot all 5000 values in the series:
讓我們繪製該系列中的所有5000個值:
Decibel level at a restaurant 餐廳的分貝級別Let’s fetch and plot the auto-correlation coefficients for the first 40 lags. We』ll the statsmodels library to do that.
讓我們獲取並繪製前40個滯後的自相關係數。 我們將使用statsmodels庫來執行此操作。
import statsmodels.graphics.tsaplots as tsatsa.plot_acf(df['Decibel'], lags=40, alpha=0.05, title='Auto-correlation coefficients for lags 1 through 40')
The alpha=0.05 tells statsmodels to also plot the 95% confidence interval region. We get the following plot:
alpha = 0.05指示statsmodels也繪製95%置信區間區域。 我們得到以下圖:
Auto-correlation plot for the decibel level time series 分貝級時間序列的自相關圖As we can see, the time series contains significant auto-correlations up through lags 17. Incidentally, the auto-correlation at lag 0 is always 1.0 as a value is always perfectly correlated with itself.
正如我們所看到的,時間序列在滯後17之前包含大量的自相關。順便說一下,滯後0處的自相關始終為1.0,因為值始終與自身完全相關。
There is wave-like pattern in the auto-correlation plot that indicates that there could be some seasonality contained in the data. We can try to identify and isolate the seasonality by decomposing the time series into the trend, seasonality and noise components.
自相關圖上有一個波狀圖案,表明數據中可能包含一些季節性。 我們可以嘗試通過將時間序列分解為趨勢,季節性和噪聲成分來識別和隔離季節性。
Related read: What is time series decomposition and how does it work
For now we』ll focus on the noise portion. The bottom line is that this time series, in its current form, does not appear to be pure white noise.
現在,我們將集中討論噪聲部分。 最重要的是,此時間序列以其當前形式似乎不是純白噪聲。
Next, we』ll two more tests on the time series to confirm this.
接下來,我們將在時間序列上再進行兩次測試以確認這一點。
卡方檢驗用於白噪聲檢測 (The Chi-squared test for white noise detection)The Chi-squared test is based on this powerful result in statistics: the sum of squares of k identical standard normal random variables is a Chi-squared distributed random variable with k degrees of freedom.
卡方檢驗基於此強大的統計結果: k個相同的標準正態隨機變量的平方和是具有k個自由度的卡方分布隨機變量。
Wikimedia under CC BY 3.0下的CC BY 3.0WikimediaThe actual test is called Box-Pierce test and it’s test statistic is called the Q statistic. Its formula is as follows:
實際測試稱為Box-Pierce測試,其測試統計量稱為Q統計量。 其公式如下:
Box-Pierce test statistic Box-Pierce檢驗統計量It can be shown that if the underlying data set is white noise, the expected value of the Q statistic is zero.
可以證明,如果基礎數據集是白噪聲,則Q統計量的期望值為零。
For any given time series, one can check if the value of Q deviates from zero in a statistically significant way looking up the p-value of the test statistic in the Chi-square tables for k degrees of freedom. Usually, a p-value of less than 0.05 indicates a significant auto-correlation that cannot be attributed to chance.
對於任何給定的時間序列,可以檢查Q值是否以統計學上顯著的方式偏離零,從而在卡方表中針對k個自由度查找測試統計量的p值。 通常,小於0.05的p值表示無法歸因於偶然性的顯著自相關。
Ljung-Box測試以檢測白噪聲 (The Ljung-Box test for white noise detection)The Ljung-Box test improves upon the Box-Pierce test to obtain a test statistic having a distribution that is closer to the Chi-square distribution than the Q statistic. The test statistic of the Ljung-Box test is calculated as follows, and it is also Chi-square(k) distributed:
Ljung-Box檢驗在Box-Pierce檢驗的基礎上進行了改進,從而獲得了一個檢驗統計量,其分布比Q統計量更接近卡方分布。 Ljung-Box檢驗的檢驗統計量計算如下,並且也是卡方(k)分布:
Ljung-Box test statistic Ljung-Box測試統計Here, n is the number of data points in the time series and k is the number of time lags to be considered. As with the Box-Pierce test, if the underlying data set is white noise, the expected value of this Chi-square distributed random variable is zero. Again, a p-value of less than 0.05 indicates a significant auto-correlation that cannot be attributed to chance.
此處, n是時間序列中的數據點數,而k是要考慮的時間延遲數。 與Box-Pierce檢驗一樣,如果基礎數據集是白噪聲,則此卡方分布隨機變量的期望值為零。 再次,小於0.05的p值表示顯著的自相關,不能將其歸因於機會。
示例:使用Python中的Ljung-Box測試測試白噪聲 (Example: Testing for white noise using the Ljung-Box test in Python)Let’s run the Ljung-Box test on the restaurant decibel level data set. We will test upto 40 lags and we』ll ask the test to also run the Box-Pierce test.
讓我們在餐廳分貝級別的數據集上運行Ljung-Box測試。 我們將測試多達40個延遲,然後要求該測試也運行Box-Pierce測試。
import statsmodels.stats.diagnostic as diag
diag.acorr_ljungbox(df['Decibel'], lags=[40], boxpierce=True, model_df=0, period=None, return_df=None)
We get the following output:
我們得到以下輸出:
(array([13172.80554476]), array([0.]), array([13156.42074648]), array([0.]))
The value 13172.80554476 is the value of the test statistic for the Ljung-Box test and 0.0 is its p-value as per the Chi-square(k=40) table.
根據卡方(k = 40)表,值13172.80554476是Ljung-Box測試的測試統計值,而0.0是其p值。
The value 13156.42074648 is the test statistic of the Box-Pierce test and 0.0 is its p-value as per the Chi-square(k=40) tables.
值13156.42074648是Box-Pierce檢驗的檢驗統計量,0.0是按照卡方(k = 40)表的p值。
As we can see, both p-values are less than 0.01 and so we can say with 99% confidence that the restaurant decibel level time series is not pure white noise.
如我們所見,兩個p值均小於0.01,因此我們可以有99%的把握說餐廳分貝級時間序列不是純白噪聲。
Earlier on, we introduced Random Walks as a special case of the White Noise model and pointed out how easy it is to mistake them for a pattern or trend that can be predicted.
早些時候,我們引入了隨機遊走作為白噪聲模型的特例,並指出將它們誤認為可預測的模式或趨勢是多麼容易。
We』ll look at how to avoid making this mistake by applying a technique that will bring out the true random nature of the Random Walk.
我們將研究如何通過應用一種能夠展現出隨機遊走的真正隨機性的技術來避免犯此錯誤。
檢測隨機遊走 (Detecting Random Walks)Random walks are often highly correlated. In fact, they are auto-correlated white noise!
隨機遊走通常是高度相關的。 實際上,它們是自動相關的白噪聲!
The white noise detection tests presented above will latch on these auto-correlations, causing them to conclude that the time series is not white noise.
上面介紹的白噪聲檢測測試將鎖定這些自相關,使他們得出時間序列不是白噪聲的結論。
The remedy is to take the first difference of the time series that is suspected to be a random walk, and run the white noise tests on the differenced series.
補救措施是採取懷疑是隨機遊走的時間序列的第一個差異,並對差異序列進行白噪聲測試。
If the original time series is a random walk, its first difference is pure white noise.
如果原始時間序列是隨機遊走,則其第一個差異是純白噪聲。
Let’s illustrate this:
讓我們說明一下:
We』ll start by loading a data set that is suspected to be a random walk. The data set can be downloaded from here.
我們將從加載懷疑是隨機遊走的數據集開始。 數據集可從此處下載。
df = pd.read_csv('random_walk.csv', header=0, index_col=[0])
Let’s plot it to see how it looks like:
讓我們對其進行繪圖以查看其外觀:
df.plot()
plt.show()
Let’s run the Ljung-Box white noise test on this data:
讓我們對這些數據運行Ljung-Box白噪聲測試:
diag.acorr_ljungbox(df['Y_i'], lags=[40], boxpierce=True)
We get the following result:
我們得到以下結果:
(array([393833.91252517]), array([0.]), array([392952.07675659]), array([0.]))
The p value of 0.0 indicates that we must strongly reject the null hypothesis that the data is white noise. Both Ljung-Box and Box-Pierce tests think that this data set has not been generated by a pure random process.
p值為0.0表示我們必須強烈拒絕數據為白噪聲的零假設。 Ljung-Box和Box-Pierce測試都認為此數據集 不是 由純隨機過程生成的。
This is obviously a false result.
這顯然是錯誤的結果。
Let’s see if things change after we take the first difference of the data, i.e. we create a new data set with Y = Y_i —Y_(i-1) :
讓我們看看在獲取數據的第一個差異之後情況是否發生了變化,即我們創建了一個新的數據集,其中Y = Y_i —Y_(i-1) :
diff_Y_i = df['Y_i'].diff()#drop the NAN in the first rowdiff_Y_i = diff_Y_i.dropna()
Let’s plot the diff-ed data set:
讓我們繪製差異數據集:
diff_Y_i.plot()
plt.show()
We now see a very different picture:
現在,我們看到了非常不同的圖片:
The differenced data set 差異數據集Here is the zoomed in view:
這是放大的視圖:
zoomed in view of the differenced data set 放大查看差異數據集Let’s run the Ljung-Box test on the differenced data set:
讓我們對不同的數據集運行Ljung-Box測試:
diag.acorr_ljungbox(diff_Y_i, lags=[40], boxpierce=True)
We get the following output:
我們得到以下輸出:
(array([32.93405364]), array([0.77822417]), array([32.85051846]), array([0.78137548]))
Notice that this time the test statistic’s value 32.934 reported by Ljung-Box, and 32.850 reported by Box-Pierce tests is much smaller. And the corresponding p-values detected on the Chi-square(k=40) tables are 0.778 and 0.781 respectively, which are well above 0.05. This is easily enough to support the null hypothesis that the data (i.e. the differenced time series) is pure white noise.
請注意,這一次的檢驗統計量的值32.934報導Ljung的盒,以及由32.850箱皮爾斯測試報告的要小得多。 卡方(k = 40)表上檢測到的相應p值分別為0.778和0.781 ,遠高於0.05。 這足夠容易地支持零假設,即數據(即時間序列不同)是純白噪聲。
The conclusion to be drawn from this exercise is that one should not fit anything except the White Noise model on this data.
從該練習中得出的結論是,除此數據上的白噪聲模型外,其他任何條件都不適合。
摘要 (Summary)The white noise model can be used to represent the nature of noise in a data set. 白噪聲模型可用於表示數據集中噪聲的性質。 Testing for white noise is one of the first things that a data scientist should do so as to avoid spending time on fitting models on data sets that offer no meaningfully extract-able information. 測試白噪聲是數據科學家應該做的第一件事,以避免花時間在不提供有意義的可提取信息的數據集的擬合模型上。 If a data set is not white noise, then after fitting a model to the data, one should run a white noise test on the residual errors to get a sense for how much information the model has been able to extract from the data. 如果數據集不是白噪聲,則在將模型擬合到數據之後,應該對殘差進行白噪聲測試,以了解模型能夠從數據中提取多少信息。 For time series data, auto-correlation plots and the Ljung-Box test offer two useful techniques for determining if the time series is in reality, just white noise. 對於時間序列數據,自相關圖和Ljung-Box測試提供了兩種有用的技術來確定時間序列是否真實,只是白噪聲。Data set of restaurant decibel levels is Copyright Sachin Date under CC-BY-NC-SA.
餐廳分貝級別的數據集為CC-BY-NC-SA下的版權Sachin日期 。
Amgen stock price chart is from stockcharts.com under these terms of use.
根據這些使用條款, Amgen股票價格圖表來自stockcharts.com 。
Paper link: Anderson, R. L., Distribution of the Serial Correlation Coefficient, Annals of Mathematical Statistics, Volume 13, Number 1 (1942), 1–13.
論文連結:Anderson,RL, 串行相關係數的分布 ,《數學統計年鑑》,第13卷,第1期(1942),1-13。
Paper link: Bartlett, M. S., On the Theoretical Specification and Sampling Properties of Autocorrelated Time-Series, Supplement to the Journal of the Royal Statistical Society, Vol. 8, №1 (1946), pp. 27–41.
論文連結:Bartlett,MS, 《自相關時間序列的理論規範和採樣特性》,《皇家統計學會雜誌》增刊 ,第1卷。 8,№1(1946),第27-41頁。
Paper link: Quenouille, M. H., The Joint Distribution of Serial Correlation Coefficients, The Annals of Mathematical Statistics, Vol. 20, №4 (Dec., 1949), pp. 561–571
論文連結:Quenouille,MH, 《序列相關係數的聯合分布》 ,《數學統計年鑑》,第1卷。 20,№4(1949年12月),第561–571頁
Book link: Hyndman, R. J., Athanasopoulos, G., Forecasting: Principles and Practice, OTexts
圖書連結:Hyndman,RJ,Athanasopoulos,G。,《 預測:原理與實踐》 ,OTexts
All images in this article are copyright Sachin Date under CC-BY-NC-SA, unless a different source and copyright are mentioned underneath the image.
本文中的所有圖像均為CC-BY-NC-SA下的版權Sachin Date ,除非在圖像下方提及其他來源和版權。
Thanks for reading! If you liked this article, please follow me to receive tips, how-tos and programming advice on regression and time series analysis.
謝謝閱讀! 如果您喜歡本文,請 關注我 以獲取有關回歸和時間序列分析的提示,操作方法和編程建議。
翻譯自: https://towardsdatascience.com/the-white-noise-model-1388dbd0a7d
arima模型 白噪聲檢驗