Visualizing Time Series Data with Python

Time series data consists of information gathered sequentially over time, illustrating how variables change at different moments. Examples include daily stock prices or hourly temperature readings. This type of data is crucial in sectors such as finance, pharmaceuticals, social media, and research, as its analysis can reveal trends, seasonal patterns, and behaviors. Such insights are valuable for forecasting and making informed decisions.

Key Concepts in Time Series Analysis

Trend: This indicates the general direction in which a time series is moving over a long period, showing whether values are increasing, decreasing, or remaining stable.
Seasonality: Refers to repetitive patterns or cycles that occur at regular intervals in a time series, often corresponding to days, weeks, months, or seasons.
Moving Average: A method used to smooth out short-term fluctuations and highlight longer-term trends or patterns in the data.
Noise: Represents the irregular and unpredictable components in a time series that do not follow a pattern.
Differencing: Involves calculating the difference between consecutive data points to remove trends or seasonality, helping achieve stationarity.
Stationarity: A stationary time series has statistical properties such as mean, variance, and autocorrelation that remain constant over time.
Order: This refers to the number of times differencing is applied to a dataset to achieve stationarity.
Autocorrelation: A statistical method used to measure the similarity between a time series and a lagged version of itself.
Resampling: A technique used to change the frequency of data observations in time series analysis.

Types of Time Series Data

Continuous Time Series: Data recorded at regular intervals with a continuous range of values, such as temperature or stock prices.
Discrete Time Series: Data with distinct values or categories, recorded at specific time points, like event counts or categorical statuses.

Visualization Approaches

Use line plots or area charts for continuous data to emphasize trends and fluctuations.
Use bar charts or histograms for discrete data to show frequency or distribution across categories.

Practical Time Series Visualization with Python

Let's explore how to visualize time series data with Python using a stock dataset.

Step 1: Installing and Importing Libraries

We'll utilize several libraries, including Numpy, Pandas, Seaborn, and Matplotlib.

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.tsa.stattools import adfuller

Step 2: Loading the Dataset

Load the dataset, ensuring to convert the Date column into a DatetimeIndex format.

df = pd.read_csv("stock_data.csv", parse_dates=True, index_col="Date")
df.head()

Step 3: Cleaning the Data

Remove any unnecessary columns from the dataset to focus on relevant information.

df.drop(columns='Unnamed: 0', inplace=True)
df.head()

Step 4: Plotting High Stock Prices

Visualize high stock prices using a line graph, as the volume column represents continuous data.

sns.set(style="whitegrid") 
plt.figure(figsize=(12, 6))
sns.lineplot(data=df, x='Date', y='High', label='High Price', color='blue')
plt.xlabel('Date')
plt.ylabel('High')
plt.title('Share Highest Price Over Time')
plt.show()

Step 5: Resampling Data

To understand data trends better, resample data using the resampling method to provide a clearer view of trends and patterns, especially when handling daily data.

df_resampled = df.resample('ME').mean(numeric_only=True) 
sns.set(style="whitegrid") 
plt.figure(figsize=(12, 6))  
sns.lineplot(data=df_resampled, x=df_resampled.index, y='High', label='Month Wise Average High Price', color='blue')
plt.xlabel('Date (Monthly)')
plt.ylabel('High')
plt.title('Monthly Resampling Highest Price Over Time')
plt.show()

Step 6: Detecting Seasonality with Autocorrelation

Detect seasonality using the autocorrelation function (ACF) plot. Regular peaks in the ACF plot suggest seasonality.

if 'Date' not in df.columns:
    print("'Date' is already the index or not present in the DataFrame.")
else:
    df.set_index('Date', inplace=True)

plt.figure(figsize=(12, 6))
plot_acf(df['High'], lags=40)
plt.xlabel('Lag')
plt.ylabel('Autocorrelation')
plt.title('Autocorrelation Function (ACF) Plot')
plt.show()

Step 7: Testing Stationarity with ADF Test

Use the Augmented Dickey-Fuller (ADF) test to check for stationarity.

result = adfuller(df['High'])
print('ADF Statistic:', result[0])
print('p-value:', result[1])
print('Critical Values:', result[4])

Step 8: Differencing to Achieve Stationarity

Apply differencing to remove trends or seasonality and achieve stationarity.

df['high_diff'] = df['High'].diff()

plt.figure(figsize=(12, 6))
plt.plot(df['High'], label='Original High', color='blue')
plt.plot(df['high_diff'], label='Differenced High', linestyle='--', color='green')
plt.legend()
plt.title('Original vs Differenced High')
plt.show()

Step 9: Smoothing Data with Moving Average

Calculate the moving average to smooth the data, providing a clearer view of the underlying trends.

window_size = 120
df['high_smoothed'] = df['High'].rolling(window=window_size).mean()

plt.figure(figsize=(12, 6))
plt.plot(df['High'], label='Original High', color='blue')
plt.plot(df['high_smoothed'], label=f'Moving Average (Window={window_size})', linestyle='--', color='orange')
plt.xlabel('Date')
plt.ylabel('High')
plt.title('Original vs Moving Average')
plt.legend()
plt.show()

Step 10: Original Data Vs Differenced Data

Compare the original data with the differenced data to observe changes.

df_combined = pd.concat([df['High'], df['high_diff']], axis=1)
print(df_combined.head())

After handling NaN values, perform the ADF test again to confirm stationarity.

df.dropna(subset=['high_diff'], inplace=True)
result = adfuller(df['high_diff'])
print('ADF Statistic:', result[0])
print('p-value:', result[1])
print('Critical Values:', result[4])