Time-series Analysis Part I

Pravin Borate
Pravin Borate
Published in
5 min readMay 3, 2020

--

Time Series: A sequence of information that attaches a time period to each value is call time series.

eg. Stock market Prices

To analyze time periods:

All time-periods must be equal and clearly defined which would result in a constant ‘Frequency.’

Let’s start to explore various challenges encounters during the time-series analysis.

Filling Null values in the Time_series data:

As we know that the time-series is completely dependent on the time instant and value for that time it’s a very challenging task to fill the null values in the time-series data.

Following three methods mainly give you a good result:
1.Front Filling
2.Back Filling
3. Assign avg values.

Filling missing values

Front Filling:

As the name suggests in the front filling we assign the value of the previous period.

Back Filling:

We can assign the value of the next periods.

Assign avg values:

Assign the avg value to all the missing values.

Splitting dataset:

In machine learning normally we split our dataset into training and testing datasets. And at the time of splitting the data, we normally shuffle it. But in the time series data relies on keeping the chronological order of the values.
so we can’t shuffle data.

So,

Training set: From the beginning up to some cut-off point.
Testing set: From the cut-off point until the end of the dataset.

Noise in the datasets:

Following are the type of noise in the datasets:

I. White Noise
II. Random Walk

White Noise:

A special type of time series where data doesn’t follow a pattern.
The following are some conditions for the white noise.
1. Constant mean
2. Constant Variance
3. No autocorrelation i.e No clear relationship between past and present values.

White Noise Time series

We can see how the complete data is surrounded around the mean.

Random Walk:

It's a special type of series where values tend to persists over time and the difference between periods is simply white noise.
i.e the best estimator for today’s value is yesterday’s value.

Let’s go further,

To conduct proper time-series data analysis it’s important to know that whether the data following stationarity or non-stationarity process.

So the question is that how can we find out that the data is stationary or non-stationary

By using the following simple test cases we can identify it.

Stationarity:

After taking the consecutive sample of data with the same size of bins they will have the same covariance then the data is in stationarity form.

Following are the covariance stationarity assumptions:
1. Constant Mean.
2. Constant Variance
3. cov(Xn, Xn+k) = cov(Xm, Xm+k)

David Dicky and Wayne Fuller gave us the Dicky-fuller test to identify that the is our data follows the stationarity or not.

This is how we can carry out the Dick-fuller test by using the statsmodel library.

here, in a result, the first value is the T-test value.

If this value is greater than the critical value then we cannot identify the stationarity.

The second value is 0.99 that’s P-value that’s means there are 99% chances that we can reject the null hypothesis. So data is stationary.

Another important thing that we have to know about our data is the data follows seasonality.

Seasonality:

In seasonality, trends will appear on a cyclic basis.

eg. Temperature rise and fall

To check it there are multiple approaches one of them is decomposition.

The decomposition we split into 3 effects:
1. Trends: Is there any pattern
2. Seasonal: Is there any cyclic effect present
3. Residual: Error of prediction

In Naive decomposition there are two main approaches :
1. Additive
2. Multiplicative

In additive observer values = Trends + Seasonal + Residual

and similarly for multiplicative decomposition observed value = Trends * Seasonal * Residual

As we can see in the seasonal plot there are no cyclic trends so we can say the data do not follow seasonality.

This is how can we identify seasonality.

Correlation Between past and Present Values:

It’s important to find out that is there any correlation between past and present values of the dataset.

And we can achieve this by using the following functions.
1. Autocorrelation Function
2. Partial Autocorrelation Function

Autocorrleation Function:

This shows that the relations between past values and current values.

auto correlation between the data
No autocorrelation between the data

Partial Autocorrelation:

Partial autocorrelation function

The major difference between ACF and PACF is, ACF measure the accumulated effects past lags on current values, while the PACF measure the direct effects.

GitHub Link for code: https://github.com/Pravin1Borate/Time-Series-Analysis

My LinkedIn Profile: https://www.linkedin.com/in/pravin-borate-14a43b133/

Kaggle profile: https://www.kaggle.com/pravinborate

--

--

Pravin Borate
Pravin Borate

Data Scientist @ ZS associate | Machine Learning | Deep Learning | NLP | Data science enthusiast || love to read || live to share gratitude