白噪聲檢驗專題及常見問題 - CSDN

2021-01-06 CSDN技術社區

arima模型 白噪聲檢驗

White noise are variations in your data that cannot be explained by any regression model.

白噪聲是數據中的變化,任何回歸模型都無法解釋。

And yet, there happens to be a statistical model for white noise. It goes like this for time series data:

然而,碰巧有一個白噪聲統計模型。 時間序列數據如下所示:

The additive white noise model 加性白噪聲模型

The observed value Y_i at time step i is the sum of the current level L_i and a random component N_i around the current level.

在時間步長i處的觀測值Y_i是當前水平L_i與當前水平附近的隨機分量N_i之和。

If the extent of random variation is proportional to the current level, then we have the following multiplicative version of the same model:

如果隨機變化的程度與當前水平成正比,那麼我們可以得到同一模型的以下乘性形式:

The multiplicative white noise model 乘法白噪聲模型

If the current level L_i is constant for all i, i.e. L_i = L for all i, then the noise will be seen to fluctuate around a fixed level.

如果當前水平L_i對於所有i都是恆定的,即L_i = L對於所有i ,則將看到噪聲圍繞固定水平波動。

It’s easy to generate a white noise data set. Here’s how to do it in Excel:

生成白噪聲數據集很容易。 這是在Excel中執行的方法:

How to generate an additive white noise data set in Excel 如何在Excel中生成加性白噪聲數據集

And here is the output plot of noise that is fluctuating around a constant level of 100:

這是噪聲的輸出圖,它在100的恆定水平附近波動:

Additive white noise around level=100 附加白噪聲級= 100

The current level L_i often changes in response to real world factors. For example, if L_i changes linearly in response to a set of regression variables X, then we get the following linear regression model:

當前水平L_i經常響應於現實世界因素而改變。 例如,如果L_i響應一組回歸變量X線性變化,那麼我們得到以下線性回歸模型:

Time series with regression variables plus noise 具有回歸變量和噪聲的時間序列

In the above equation, β is the vector of regression coefficients and X_i is a vector of regression variables.

在上式中, β是回歸係數的向量, X_i是回歸變量的向量。

為什麼研究白噪聲模型很重要? (Why is it important to study the white noise model?)

There are three reasons why:

原因有以下三個:

If you discover using some techniques which I will describe soon, that your data is basically white noise around a fixed level, then the best that you can do is fit a model around that fixed level. It will be a waste of time to try to do anything better than that.

如果您使用我將很快描述的一些技術發現,您的數據基本上是固定水平附近的白噪聲,那麼您可以做的最好的事情就是將模型固定在該水平附近。 嘗試做任何比這更好的事情都是浪費時間。 Suppose you have already fitted a regression model to a data set. If you are able to show that the residual errors of the fitted model are white noise, it means your model has done a great job of explaining the variance in the dependent variable. There is nothing left to extract in the way of information and whatever is left is noise. You can pat yourself on the back for a job well done!

假設您已經對數據集擬合了回歸模型。 如果您能夠證明擬合模型的殘留誤差是白噪聲,則表明您的模型在解釋因變量的方差方面做得很好。 沒有什麼可以提取信息的方式了,剩下的就是噪音。 您可以輕拍自己的背,以完成出色的工作! Thirdly, the white noise model happens to be a stepping stone to another important and famous model in statistics called the Random Walk model which I will explain in the next section.

第三,白噪聲模型恰好是統計學中另一個重要且著名的模型(稱為隨機遊走模型)的墊腳石,我將在下一部分中進行解釋。
隨機遊走模型 (The Random Walk Model)

Let’s again look at the White Noise Model’s equation:

讓我們再次看一下白噪聲模型的方程:

If we make the level level L_i at time step i be the output value of the model from the previous time step (i-1), we get the Random Walk model, made famous in the popular literature by Burton Malkiel’s A Random Walk Down Wall Street.

如果我們將時間步長i處的水平L_i設為上一個時間步長(i-1)的模型的輸出值, 則會得到隨機遊走模型,模型在伯頓·馬爾基爾(Burton Malkiel)的《隨機遊走的牆壁》一書中廣受歡迎街 。

The Random Walk Model 隨機遊走模型

The Random Walk model is like the mirage of the Data Science dessert. It has lured many profit-thirsty investors into betting (and losing) their shirt on illusions of trends in stock price movements, movements that were in reality little more than a random walk.

隨機遊走模型就像數據科學甜點的海市rage樓。 它已經吸引了許多渴求利潤的投資者,將其押注(輸掉)他們的襯衫,以幻想股價走勢的錯覺,實際上,這些走勢只是隨意走動而已。

Here’s a plot of data that was generated using the Random Walk model:

這是使用隨機遊走模型生成的數據圖:

A Random Walk 隨機漫步

Just tell me you don’t see any trends in this plot!

告訴我,您在該圖中看不到任何趨勢!

If you are not completely convinced that the above data can be generated by a purely random process, let’s puff away any remaining illusions by showing how to generate this data in Excel:

如果您不完全相信上面的數據可以通過純隨機過程生成,那麼讓我們通過展示如何在Excel中生成此數據來消除任何剩餘的幻想:

How to generate Random Walk data in Excel 如何在Excel中生成隨機遊動數據

Let’s look at how we can make use of our knowledge of white noise and random walks to try to detect their presence in time series data.

讓我們看看如何利用我們對白噪聲和隨機遊走的知識來嘗試檢測時間序列數據中它們的存在。

如何在時間序列數據集中檢測白噪聲 (How to detect white noise in a time series data set)

We』ll look at 3 tests to determine whether your time series is in reality, just white noise:

我們將通過3種測試來確定您的時間序列是否真實,只是白噪聲:

Auto-correlation plots

自相關圖 The Box-Pierce test

Box-Pierce檢驗 The Ljung-Box test

Ljung-Box測試 使用自相關圖測試白噪聲 (Testing for white noise using auto-correlation plots)

When two variables move up or down in unison (or if one value goes up, the other one goes down), they are said to be positively (or negatively) correlated. The correlation coefficient can be used to measure the degree of linear correlation between two such variables:

當兩個變量一致地上下移動(或者一個值上升時,另一個變量下降)時,它們被認為是正相關的(或負相關的)。 相關係數可用於測量兩個此類變量之間的線性相關程度:

X and XYY之間的線性相關

In the above formula, E(X) and E(Y) are the expected (i.e. mean) values of X and Y. σ_X and σ_Y are the standard deviations of X and Y.

在上式中, E( X )和E( Y )是XY的預期(即平均值)值。 σ_X和σ_Y是XY的標準偏差。

In time series data, correlations often exist between the current value and values that are 1 time step or more older than the current value, i.e. between Y_i and Y_(i-1), between Y_i and Y_(i-2) and so on. Stock price changes often show such patterns of positive and negative correlations (and beware, so do data containing random walks!).

在時間序列數據中,當前值和比當前值早1個時間步或更早的值之間通常存在相關性,即Y_i和Y_(i-1)之間, Y_i和Y_(i-2)之間等等。 。 股票價格變化通常顯示出正相關和負相關的模式(請注意,包含隨機遊走的數據也是如此!)。

StockCharts.com under 使用條款,圖表由terms of useStockCharts.com提供

Because the values are correlated with past versions of themselves, we call them auto, meaning self correlated.

由於這些值與自身的過去版本相關,因此我們將其稱為「自動」,即自相關。

Here is the formula for calculating the auto-correlation coefficient between Y_i and Y_(i-k):

這是用於計算Y_i和Y_(ik)之間的自相關係數的公式:

Auto-correlation coefficient at lag k 滯後k的自相關係數

Before we can show how this auto-correlation coefficient r_k can be used to detect white noise, we need to take a short and pleasant side-trip into the land of random variables. I』ll explain why r_k is a normally distributed random variable and how this property of r_k can be used to detect white noise.

在我們展示如何使用該自相關係數r_k來檢測白噪聲之前,我們需要對隨機變量進行短暫而愉快的旁通 。 我將解釋為什麼r_k是正態分布的隨機變量,以及r_k的此屬性如何用於檢測白噪聲。

LAG-k自相關係數 r_k的分布 (Distribution of the LAG-k auto-correlation coefficient r_k)

For any lag k, r_k is a normally distributed random variable with some mean µ_k and variance σ²_k.

任何滯後K,r_k是與一些均值μ_K和方差σ²_K正態分布的隨機變量。

To understand why, consider this thought experiment:

要了解原因,請考慮以下思想實驗:

Take a time series data set containing 100,000 time points.

取得包含100,000個時間點的時間序列數據集。

Draw 5000 randomly selected samples from this data set. Suppose each sample is of length 100 continuous time points.

從該數據集中抽取5000個隨機選擇的樣本。 假設每個樣本的長度為100個連續時間點。

For each sample, calculate the LAG-1 auto-correlation coefficient r_1 using the above formula for r_k.

對於每個樣本,使用上述r_k公式計算LAG-1自相關係數r_1 。

One can see that each time, r_1 will come out to be some value between 0 and 1 for each sample of 100 time points. So we end up with 5000 values of r_1, each a number between 0 and 1. Thus r_1 is a random variable for which we have measured 5000 values.

可以看到,對於100個時間點的每個樣本, r_1每次都會得出介於0和1之間的某個值。 因此,我們得到r_1的5000個值,每個值在0到1之間。因此r_1是一個隨機變量,我們已經為它測量了5000個值。

By appealing to the Limit Theorems of statistics, it can be shown r_1 is a normally distributed random variable, and the distribution of r_1 is centered at some population mean, we』ll call it µ_1, and some variance, we』ll call it σ²_1. In practice, the observed mean and variance of r_1 will be somewhere close to the mean of the 5000 values of r_1 which we measured.

通過利用統計的極限定理, 可以證明r_1是正態分布的隨機變量,並且r_1的分布以某個總體平均值為中心,我們將其稱為µ_1,將某些方差稱為σ²_1 。 實際上,觀察到的r_1的均值和方差將接近我們測量的r_1的5000個值的均值。

By repeating the above experiment for all lags k, it can be shown that auto-correlation coefficients for all lags are normally distributed random variables with mean µ_k and variance σ²_k.

通過對所有滯後k重複上述實驗,可以證明所有滯後的自相關係數都是均值μ_k和方差σ²_k的 正態分布隨機變量 。

Symbolically:

象徵性地:

For all lags k, r_k is a normally distributed random variable 對於所有滯後k,r_k是正態分布的隨機變量 檢測白噪聲的含義 (Implications for detecting white noise)

If the time series is white noise, then in theory, its current value T_i ought not be correlated at all with past values T_(i-1), T_(i-2) etc, and the corresponding auto-correlation coefficients r_1, r_2,…etc. will be zero or close to zero.

如果時間序列是白噪聲,那麼從理論上講,它的當前值T_i根本不應該與過去的值T_(i-1),T_(i-2)等以及相應的自相關係數r_1,r_2相關,…等將為零或接近零。

i.e.when the time series is white noise, r_k is 0 for all k = 1, 2, 3,…

即,當時間序列是白噪聲時,對於所有k = 1、2、3 ... , r_k為0

But we have just seen that r_k is a N(µ_k, σ²_k) random variable.

但是我們剛剛看到r_k是一個N(µ_k,σ²_k)隨機變量。

Putting the above two facts together, we arrive at the following first important implication:

綜合以上兩個事實,我們得出以下第一個重要含義:

If the time series is white noise, then the auto-correlation coefficient r_k for all lags k will have a zero mean and some variance σ²_k.

如果時間序列是白噪聲,則所有滯後k的自相關係數r_k將具有零均值和一些方差σ²_k。

Symbolically:

象徵性地:

For all lags k, r_k has zero mean under white noise conditions 對於所有滯後k,在白噪聲條件下r_k的均值為零

But what about the variance σ²_k of the coefficients r_k?

但是關於係數r_k的方差σ²_k什麼?

Anderson, Bartlett and Quenouille have shown that under white noise conditions, the standard deviation σ_k is as follows:

Anderson , Bartlett和Quenouille證明,在白噪聲條件下,標準偏差σ_k如下:

σ_k = 1/sqrt(n)

σ_k= 1 /平方根(n)

Where n is the same size. Recollect that in our thought experiment, n was 100.

其中n是相同的大小。 回憶一下我們的思想實驗中, n為100。

Thus, we know that r_k under white noise conditions has the following distribution:

因此,我們知道白噪聲條件下的r_k具有以下分布:

Distribution of auto-correlation coefficients when the data set is pure white noise 數據集為純白噪聲時自相關係數的分布

An important property of the normal distribution is that approximately 95% of it lies within 1.96 standard deviations from the mean. In our case, the mean is 0 and standard deviation is 1/sqrt(n), so we get the following 95% confidence interval for the auto-correlation coefficients:

正態分布的一個重要屬性是大約95%的分布在均值的1.96標準偏差之內。 在我們的情況下,平均值為0,標準偏差為1 / sqrt(n) ,因此對於自相關係數,我們得到以下95%的置信區間:

These results yield the following procedure for conducting the white noise test using the auto-correlation coefficients r_k:

這些結果得出以下使用自相關係數r_k進行白噪聲測試的過程 :

Calculate the first k auto-correlation coefficients r_k. k can be set to some high enough value depending on the length n of the time series data set.

計算前k個自相關係數r_k 。 可以將k設置為足夠高的值,具體取決於時間序列數據集的長度n 。

Calculate the 95% confidence interval [ — 1.96/sqrt(n), +1.96/sqrt(n)].

計算95%置信區間[-1.96 / sqrt(n),+ 1.96 / sqrt(n)]。

If for all k, if r_k lies within the above confidence interval, conclude at a 95% confidence level that the time series is in reality, possibly just white noise. We say possibly because if we experiment with larger sample sizes, i.e. larger n, the size of the confidence interval will shrink, and values of r_k that were previously inside the 95% bounds will now lie outside the 95% bounds.

如果對於所有k ,如果r_k都在上述置信區間內,則以95%的置信度推斷該時間序列實際上是現實的, 可能只是白噪聲。 我們之所以說是可能的,是因為如果我們嘗試使用更大的樣本量(即更大的n) ,則置信區間的大小將縮小,並且先前在95%範圍內的r_k值現在將在95%範圍之外。

If any of the r_k lie outside the confidence interval, then the time series possibly has information in it.

如果r_k中的任何一個位於置信區間之外,則時間序列中可能包含信息。

示例:使用Python檢測白噪聲 (Example: White noise detection using Python)

Let’s illustrate the above procedure using a real world time series of 5000 decibel level measurements taken at a restaurant using the Google Science Journal app.

讓我們通過使用Google Science Journal應用程式在餐廳進行的5000分貝水平的真實世界時間序列說明上述過程。

The data set can be downloaded from here.

數據集可從此處下載。

We』ll use the pandas library to load the data set from the csv file and plot it:

我們將使用pandas庫從csv文件加載數據集並進行繪製:

import pandas as pdimport numpy as npfrom matplotlib import pyplot as pltdf = pd.read_csv('restaurant_decibel_level.csv', header=0, index_col=[0])

Let’s print the top 10 rows:

讓我們列印前十行:

df.head(10) Decibel
TimeIndex
0 55.931323
40 57.779260
80 62.956952
140 65.158100
180 60.325242
220 45.411725
262 55.958807
300 62.021807
340 62.222563
380 56.156684

Let’s plot all 5000 values in the series:

讓我們繪製該系列中的所有5000個值:

Decibel level at a restaurant 餐廳的分貝級別

Let’s fetch and plot the auto-correlation coefficients for the first 40 lags. We』ll the statsmodels library to do that.

讓我們獲取並繪製前40個滯後的自相關係數。 我們將使用statsmodels庫來執行此操作。

import statsmodels.graphics.tsaplots as tsatsa.plot_acf(df['Decibel'], lags=40, alpha=0.05, title='Auto-correlation coefficients for lags 1 through 40')

The alpha=0.05 tells statsmodels to also plot the 95% confidence interval region. We get the following plot:

alpha = 0.05指示statsmodels也繪製95%置信區間區域。 我們得到以下圖:

Auto-correlation plot for the decibel level time series 分貝級時間序列的自相關圖

As we can see, the time series contains significant auto-correlations up through lags 17. Incidentally, the auto-correlation at lag 0 is always 1.0 as a value is always perfectly correlated with itself.

正如我們所看到的,時間序列在滯後17之前包含大量的自相關。順便說一下,滯後0處的自相關始終為1.0,因為值始終與自身完全相關。

There is wave-like pattern in the auto-correlation plot that indicates that there could be some seasonality contained in the data. We can try to identify and isolate the seasonality by decomposing the time series into the trend, seasonality and noise components.

自相關圖上有一個波狀圖案,表明數據中可能包含一些季節性。 我們可以嘗試通過將時間序列分解為趨勢,季節性和噪聲成分來識別和隔離季節性。

Related read: What is time series decomposition and how does it work

For now we』ll focus on the noise portion. The bottom line is that this time series, in its current form, does not appear to be pure white noise.

現在,我們將集中討論噪聲部分。 最重要的是,此時間序列以其當前形式似乎不是純白噪聲。

Next, we』ll two more tests on the time series to confirm this.

接下來,我們將在時間序列上再進行兩次測試以確認這一點。

卡方檢驗用於白噪聲檢測 (The Chi-squared test for white noise detection)

The Chi-squared test is based on this powerful result in statistics: the sum of squares of k identical standard normal random variables is a Chi-squared distributed random variable with k degrees of freedom.

卡方檢驗基於此強大的統計結果: k個相同的標準正態隨機變量的平方和是具有k個自由度的卡方分布隨機變量。

Wikimedia under CC BY 3.0下的CC BY 3.0Wikimedia

The actual test is called Box-Pierce test and it’s test statistic is called the Q statistic. Its formula is as follows:

實際測試稱為Box-Pierce測試,其測試統計量稱為Q統計量。 其公式如下:

Box-Pierce test statistic Box-Pierce檢驗統計量

It can be shown that if the underlying data set is white noise, the expected value of the Q statistic is zero.

可以證明,如果基礎數據集是白噪聲,則Q統計量的期望值為零。

For any given time series, one can check if the value of Q deviates from zero in a statistically significant way looking up the p-value of the test statistic in the Chi-square tables for k degrees of freedom. Usually, a p-value of less than 0.05 indicates a significant auto-correlation that cannot be attributed to chance.

對於任何給定的時間序列,可以檢查Q值是否以統計學上顯著的方式偏離零,從而在卡方表中針對k個自由度查找測試統計量的p值。 通常,小於0.05的p值表示無法歸因於偶然性的顯著自相關。

Ljung-Box測試以檢測白噪聲 (The Ljung-Box test for white noise detection)

The Ljung-Box test improves upon the Box-Pierce test to obtain a test statistic having a distribution that is closer to the Chi-square distribution than the Q statistic. The test statistic of the Ljung-Box test is calculated as follows, and it is also Chi-square(k) distributed:

Ljung-Box檢驗在Box-Pierce檢驗的基礎上進行了改進,從而獲得了一個檢驗統計量,其分布比Q統計量更接近卡方分布。 Ljung-Box檢驗的檢驗統計量計算如下,並且也是卡方(k)分布:

Ljung-Box test statistic Ljung-Box測試統計

Here, n is the number of data points in the time series and k is the number of time lags to be considered. As with the Box-Pierce test, if the underlying data set is white noise, the expected value of this Chi-square distributed random variable is zero. Again, a p-value of less than 0.05 indicates a significant auto-correlation that cannot be attributed to chance.

此處, n是時間序列中的數據點數,而k是要考慮的時間延遲數。 與Box-Pierce檢驗一樣,如果基礎數據集是白噪聲,則此卡方分布隨機變量的期望值為零。 再次,小於0.05的p值表示顯著的自相關,不能將其歸因於機會。

示例:使用Python中的Ljung-Box測試測試白噪聲 (Example: Testing for white noise using the Ljung-Box test in Python)

Let’s run the Ljung-Box test on the restaurant decibel level data set. We will test upto 40 lags and we』ll ask the test to also run the Box-Pierce test.

讓我們在餐廳分貝級別的數據集上運行Ljung-Box測試。 我們將測試多達40個延遲,然後要求該測試也運行Box-Pierce測試。

import statsmodels.stats.diagnostic as diag
diag.acorr_ljungbox(df['Decibel'], lags=[40], boxpierce=True, model_df=0, period=None, return_df=None)

We get the following output:

我們得到以下輸出:

(array([13172.80554476]), array([0.]), array([13156.42074648]), array([0.]))

The value 13172.80554476 is the value of the test statistic for the Ljung-Box test and 0.0 is its p-value as per the Chi-square(k=40) table.

根據卡方(k = 40)表,值13172.80554476是Ljung-Box測試的測試統計值,而0.0是其p值。

The value 13156.42074648 is the test statistic of the Box-Pierce test and 0.0 is its p-value as per the Chi-square(k=40) tables.

13156.42074648是Box-Pierce檢驗的檢驗統計量,0.0是按照卡方(k = 40)表的p值。

As we can see, both p-values are less than 0.01 and so we can say with 99% confidence that the restaurant decibel level time series is not pure white noise.

如我們所見,兩個p值均小於0.01,因此我們可以有99%的把握說餐廳分貝級時間序列不是純白噪聲。

Earlier on, we introduced Random Walks as a special case of the White Noise model and pointed out how easy it is to mistake them for a pattern or trend that can be predicted.

早些時候,我們引入了隨機遊走作為白噪聲模型的特例,並指出將它們誤認為可預測的模式或趨勢是多麼容易。

We』ll look at how to avoid making this mistake by applying a technique that will bring out the true random nature of the Random Walk.

我們將研究如何通過應用一種能夠展現出隨機遊走的真正隨機性的技術來避免犯此錯誤。

檢測隨機遊走 (Detecting Random Walks)

Random walks are often highly correlated. In fact, they are auto-correlated white noise!

隨機遊走通常是高度相關的。 實際上,它們是自動相關的白噪聲!

The white noise detection tests presented above will latch on these auto-correlations, causing them to conclude that the time series is not white noise.

上面介紹的白噪聲檢測測試將鎖定這些自相關,使他們得出時間序列不是白噪聲的結論。

The remedy is to take the first difference of the time series that is suspected to be a random walk, and run the white noise tests on the differenced series.

補救措施是採取懷疑是隨機遊走的時間序列的第一個差異,並對差異序列進行白噪聲測試。

If the original time series is a random walk, its first difference is pure white noise.

如果原始時間序列是隨機遊走,則其第一個差異是純白噪聲。

Let’s illustrate this:

讓我們說明一下:

We』ll start by loading a data set that is suspected to be a random walk. The data set can be downloaded from here.

我們將從加載懷疑是隨機遊走的數據集開始。 數據集可從此處下載。

df = pd.read_csv('random_walk.csv', header=0, index_col=[0])

Let’s plot it to see how it looks like:

讓我們對其進行繪圖以查看其外觀:

df.plot()
plt.show()

Let’s run the Ljung-Box white noise test on this data:

讓我們對這些數據運行Ljung-Box白噪聲測試:

diag.acorr_ljungbox(df['Y_i'], lags=[40], boxpierce=True)

We get the following result:

我們得到以下結果:

(array([393833.91252517]), array([0.]), array([392952.07675659]), array([0.]))

The p value of 0.0 indicates that we must strongly reject the null hypothesis that the data is white noise. Both Ljung-Box and Box-Pierce tests think that this data set has not been generated by a pure random process.

p值為0.0表示我們必須強烈拒絕數據為白噪聲的零假設。 Ljung-Box和Box-Pierce測試都認為此數據集 不是 由純隨機過程生成的。

This is obviously a false result.

這顯然是錯誤的結果。

Let’s see if things change after we take the first difference of the data, i.e. we create a new data set with Y = Y_i —Y_(i-1) :

讓我們看看在獲取數據的第一個差異之後情況是否發生了變化,即我們創建了一個新的數據集,其中Y = Y_i —Y_(i-1) :

diff_Y_i = df['Y_i'].diff()#drop the NAN in the first rowdiff_Y_i = diff_Y_i.dropna()

Let’s plot the diff-ed data set:

讓我們繪製差異數據集:

diff_Y_i.plot()
plt.show()

We now see a very different picture:

現在,我們看到了非常不同的圖片:

The differenced data set 差異數據集

Here is the zoomed in view:

這是放大的視圖:

zoomed in view of the differenced data set 放大查看差異數據集

Let’s run the Ljung-Box test on the differenced data set:

讓我們對不同的數據集運行Ljung-Box測試:

diag.acorr_ljungbox(diff_Y_i, lags=[40], boxpierce=True)

We get the following output:

我們得到以下輸出:

(array([32.93405364]), array([0.77822417]), array([32.85051846]), array([0.78137548]))

Notice that this time the test statistic’s value 32.934 reported by Ljung-Box, and 32.850 reported by Box-Pierce tests is much smaller. And the corresponding p-values detected on the Chi-square(k=40) tables are 0.778 and 0.781 respectively, which are well above 0.05. This is easily enough to support the null hypothesis that the data (i.e. the differenced time series) is pure white noise.

請注意,這一次的檢驗統計量的值32.934報導Ljung的盒,以及由32.850箱皮爾斯測試報告的要小得多。 卡方(k = 40)表上檢測到的相應p值分別0.7780.781 ,遠高於0.05。 這足夠容易地支持零假設,即數據(即時間序列不同)是純白噪聲。

The conclusion to be drawn from this exercise is that one should not fit anything except the White Noise model on this data.

從該練習中得出的結論是,除此數據上的白噪聲模型外,其他任何條件都不適合。

摘要 (Summary)The white noise model can be used to represent the nature of noise in a data set.

白噪聲模型可用於表示數據集中噪聲的性質。 Testing for white noise is one of the first things that a data scientist should do so as to avoid spending time on fitting models on data sets that offer no meaningfully extract-able information.

測試白噪聲是數據科學家應該做的第一件事,以避免花時間在不提供有意義的可提取信息的數據集的擬合模型上。 If a data set is not white noise, then after fitting a model to the data, one should run a white noise test on the residual errors to get a sense for how much information the model has been able to extract from the data.

如果數據集不是白噪聲,則在將模型擬合到數據之後,應該對殘差進行白噪聲測試,以了解模型能夠從數據中提取多少信息。 For time series data, auto-correlation plots and the Ljung-Box test offer two useful techniques for determining if the time series is in reality, just white noise.

對於時間序列數據,自相關圖和Ljung-Box測試提供了兩種有用的技術來確定時間序列是否真實,只是白噪聲。
參考,引用和版權 (References, Citations and Copyrights)

Data set of restaurant decibel levels is Copyright Sachin Date under CC-BY-NC-SA.

餐廳分貝級別的數據集為CC-BY-NC-SA下的版權Sachin日期 。

Amgen stock price chart is from stockcharts.com under these terms of use.

根據這些使用條款, Amgen股票價格圖表來自stockcharts.com 。

Paper link: Anderson, R. L., Distribution of the Serial Correlation Coefficient, Annals of Mathematical Statistics, Volume 13, Number 1 (1942), 1–13.

論文連結:Anderson,RL, 串行相關係數的分布 ,《數學統計年鑑》,第13卷,第1期(1942),1-13。

Paper link: Bartlett, M. S., On the Theoretical Specification and Sampling Properties of Autocorrelated Time-Series, Supplement to the Journal of the Royal Statistical Society, Vol. 8, №1 (1946), pp. 27–41.

論文連結:Bartlett,MS, 《自相關時間序列的理論規範和採樣特性》,《皇家統計學會雜誌》增刊 ,第1卷。 8,№1(1946),第27-41頁。

Paper link: Quenouille, M. H., The Joint Distribution of Serial Correlation Coefficients, The Annals of Mathematical Statistics, Vol. 20, №4 (Dec., 1949), pp. 561–571

論文連結:Quenouille,MH, 《序列相關係數的聯合分布》 ,《數學統計年鑑》,第1卷。 20,№4(1949年12月),第561–571頁

Book link: Hyndman, R. J., Athanasopoulos, G., Forecasting: Principles and Practice, OTexts

圖書連結:Hyndman,RJ,Athanasopoulos,G。,《 預測:原理與實踐》 ,OTexts

All images in this article are copyright Sachin Date under CC-BY-NC-SA, unless a different source and copyright are mentioned underneath the image.

本文中的所有圖像均為CC-BY-NC-SA下的版權Sachin Date ,除非在圖像下方提及其他來源和版權。

Thanks for reading! If you liked this article, please follow me to receive tips, how-tos and programming advice on regression and time series analysis.

謝謝閱讀! 如果您喜歡本文,請 關注我 以獲取有關回歸和時間序列分析的提示,操作方法和編程建議。

翻譯自: https://towardsdatascience.com/the-white-noise-model-1388dbd0a7d

arima模型 白噪聲檢驗

相關焦點

  • 平穩性檢驗結果分析專題及常見問題 - CSDN
    白噪聲屬於平穩序列,因為它的均值為0,方差為常數,協方差為0。但白噪聲屬於純隨機序列,基於它預測是沒有意義的。隨機遊走屬於非平穩序列,因為它的均值為常數,但是方差為非常數,與時間t有關。平穩性檢驗對於一個時間序列,我們如何處理呢?
  • arma python 檢驗專題及常見問題 - CSDN
    假如某個觀察值序列通過序列預處理可以判定為平穩非白噪聲序列,就可以利用ARMA模型對該序列進行建模。建模的基本步驟如下:(1)求出該觀察值序列的樣本自相關係數(ACF)和樣本偏自相關係數(PACF)的值。(2)根據樣本自相關係數和偏自相關係數的性質,選擇適當的ARMA(p,q)模型進行擬合。(3)估計模型中位置參數的值。
  • python平穩性檢驗專題及常見問題 - CSDN
    一、平穩序列建模步驟假如某個觀察值序列通過序列預處理可以判定為平穩非白噪聲序列,就可以利用ARMA模型對該序列進行建模。建模的基本步驟如下:(1)求出該觀察值序列的樣本自相關係數(ACF)和樣本偏自相關係數(PACF)的值。(2)根據樣本自相關係數和偏自相關係數的性質,選擇適當的ARMA(p,q)模型進行擬合。
  • 對抗學習專題及常見問題 - CSDN
    隱馬爾可夫:通過顯序列算隱序列generalization ability 泛化:不要過擬合神經網絡:非線性積累NP完全問題/whitesilence/article/details/75667002 計算解釋】statistical hypothesis testing 假設檢驗,chi-square test 卡方檢驗,Testing Statistical Hypotheses.Springer
  • f檢驗 matlab專題及常見問題 - CSDN
    Mann-Kendall顯著性檢驗工具:1、MATLAB2、DPS3、示例數據(數據採用的是魏鳳英老師《現代氣候統計診斷預測技術》中的1900-1990年上海市的年平均氣溫數據)Mann-Kendall原理可詳細參考https://wenku.baidu.com
  • 方差檢驗專題及常見問題 - CSDN
    假設檢驗的步驟(1)建立統計假設;(2)收集樣本數據;(3)選定檢驗方法;(4)計算檢驗統計量值;(5)確定P值,給出推斷結論。二、t檢驗t檢驗是應用t分布的特徵,將t作為檢驗的統計量來進行的檢驗。對於兩組服從正態分布的定量數據的平均數差異的檢驗均可以採用t檢驗,常見的t檢驗有單樣本t檢驗、獨立樣本t檢驗和配對樣本t檢驗。1.單樣本T檢驗單樣本t檢驗是指對樣本平均數與總體平均數的差異進行的顯著性檢驗。即檢驗單個變量的均值是否與給定的常數之間存在差異。
  • t檢驗 機器學習專題及常見問題 - CSDN
    均值對比的假設檢驗方法主要有 Z 檢驗和 T 檢驗,它們的區別在於 Z 檢驗面向總體數據和大樣本數據,而 T 檢驗適用於小規模抽樣樣本。下面分別介紹 Z 檢驗和 T 檢驗。Z 檢驗需要事先知道總體方差,另外,如果總體不服從正態分布,那麼樣本量要大於等於 30 ;如果總體服從正態分布,那麼對樣本量沒有要求。
  • 白噪聲和Chirp信號在聲音導航效果的異同
    (附件2): 採用matlab生成8192個隨機白噪聲數據,然後用單片機將隨機數據按照40K的頻率進行播放。 ➤02 結果分析 從測試結果來看,播放白噪聲的確也可以實現測距,當然也可以實現聲音定位,但從實際測試中也有以下疑問: 1.問題1: 從上面的白噪聲波形看,如果由人直接判斷接收到的波形,很難去分析當前接收到的波形質量是否可靠,因為信號源就是隨機的白噪聲,沒有規律可循,從分析和寫程序來說可能沒有chirp信號那種波形能給人帶來那麼直觀的感受。
  • matlab模型檢驗 - CSDN
    2.移動平均模型MA(q),q-移動平均階數;MA則可以解決隨機變動也就是噪聲的問題;ARMA - Auto Regression and Moving Average,自回歸移動平均模型。擬合模型後,需要對殘差序列檢驗是否為白噪聲,輸入:predict r,residual數據集中生成了新的殘差序列r:
  • 梯度檢驗專題及常見問題 - CSDN
    pos = f(x).copy() x[ix] = oldval - h neg = f(x).copy() x[ix] = oldval grad[ix] = np.sum((pos - neg) * df) / (2 * h) it.iternext() return grad     檢驗
  • python 卡方檢驗專題及常見問題 - CSDN
    卡方檢驗可以用於判斷兩個類別變量的相關性是否顯著。在分類的應用場景中可以用卡方檢驗選擇特徵,特徵與目標變量的相關性越顯著說明特徵越重要,預測力越強。      一、先簡單介紹一下卡方檢驗的步驟。假設y為目標變量,取值為好和壞,x為特徵變量取值為高、中、低。
  • 的方法 線性回歸方程檢驗專題及常見問題 - CSDN
    因此,對相關程度進行檢驗也是重要的,相關程度的檢驗方法主要有三種:相關係數的檢驗回歸方程的檢驗回歸係數的檢驗相關係數的檢驗變差關係先來一張圖:如上圖所示:當給定X0時,Y的實際值與均值的差值就是Y值隨X值的全部變化,稱之為總變差。
  • 機器學習模型檢驗專題及常見問題 - CSDN
    Python機器學習及實踐——進階篇:模型實用技巧(模型檢驗與超參數搜索)1.模型檢驗在真正實踐機器學習任務的時候,我們並不可能直到正確答案。
  • 多元線性回歸t檢驗專題及常見問題 - CSDN
    1.t檢驗t檢驗是對單個變量係數的顯著性檢驗,一般看p值; 如果p值小於0.05表示該自變量對因變量解釋性很強。2.F檢驗F檢驗是對整體回歸方程顯著性的檢驗,即所有變量對被解釋變量的顯著性檢驗
  • r語言檢驗序列相關 - CSDN
    解釋建模的基本步驟:通過read.table()收集數據,ts()繪製時序圖根據觀察時序圖以及白噪聲檢驗Box.test(),進行平穩性判別的檢驗若得到平穩的非白噪聲序列,則進行模式識別畫自相關圖和非自相關圖,根據兩圖的結尾性和拖尾性進行AR、MA、ARMA的模式識別對識別後模式中的位置參數進行參數估計arima()模型檢驗分為:①殘差的白噪聲檢驗;②過度擬合檢驗pt()模型檢驗通過則進行模型優化,否則重新進行模式識別模型優化中得到AIC和BIC值,進行模型的優化然後進行預測與控制2.
  • github覆蓋本地專題及常見問題 - CSDN
    參考文獻[1] Github進行fork後如何與原倉庫同步 https://blog.csdn.net/matrix_google/article/details/80676034[2] git分支查看及切換 https://blog.csdn.net/qq_26710805/article/details/80674006[3] git 放棄本地修改 https://
  • python時間序列平穩性檢驗專題及常見問題 - CSDN
    在做時間序列分析時,我們經常要對時間序列進行平穩性檢驗,而我們常用的軟體是SPSS或SAS,但實際上python也可以用來做平穩性檢驗,而且效果也非常好,今天筆者就講解一下如何用python來做時間序列的平穩性檢驗。首先我們還是來簡單介紹一下平穩性檢驗的相關概念。圖1.
  • python 顯著性水平專題及常見問題 - CSDN
    根據這個定義,我們就知道,當我們在提出問題的時候,需要定義一個「零假設」,和一個反命題作為「備選假設」;就比如我們把「A九是胖妞」作為零假設,呃,換一個吧,「A九是美妞」作為零假設,那麼「A九不是美妞」就是備選假設(-_- 什麼舉例),不管怎樣還是比較好理解是吧~接下來,我們需要根據我們的問題和抽樣情況,來判斷屬於什麼檢驗類型。
  • android 監聽屏幕鎖屏專題及常見問題 - CSDN
    > 鎖屏聽音樂(音頻),沒有鎖屏看視頻Android系統亮屏、鎖屏、屏幕解鎖事件(解決部分手機亮屏後未解鎖即進入resume狀態)- http://blog.csdn.net/oracleot/article/details/20378453Android 實現鎖屏的較完美方案- https://segmentfault.com/a/1190000003075989
  • matlab語音信號採集程序專題及常見問題 - CSDN
    6、語音/音樂信號的濾波去噪6.1題目要求① 原始信號疊加幅度為0.05,頻率為3kHz,5kHz,8kHz的三餘弦混合噪聲,觀察噪聲頻譜以及加噪後語音/音樂信號的音頻和頻譜,並播放音樂,感受噪聲對語音/音樂信號的影響;② 給原始語音/音樂信號疊加幅度為0.5的隨機白噪聲(可用rand語句產生),觀察噪聲頻譜以及加噪後語音/音樂信號的音頻和頻譜,並播放音樂,感受噪聲對語音