{"id":172220,"date":"2023-10-16T14:24:40","date_gmt":"2023-10-16T13:24:40","guid":{"rendered":"https:\/\/liora.io\/en\/?p=172220"},"modified":"2026-02-06T08:53:35","modified_gmt":"2026-02-06T07:53:35","slug":"the-var-model-from-business-approach-to-results-plotting","status":"publish","type":"post","link":"https:\/\/liora.io\/en\/the-var-model-from-business-approach-to-results-plotting","title":{"rendered":"The VAR model: From business approach to results plotting"},"content":{"rendered":"<ol>\n<li><a href=\"\/#introduction\">Introduction<\/a><\/li>\n<li><a href=\"\/#definition\">Definition of the VAR Model: Equation &amp; Significance<\/a><\/li>\n<li><a href=\"\/#import\">Library Import and Setup<\/a><\/li>\n<li><a href=\"\/#stationnarite\">Stationarity: What Is It and Why Is It Important<\/a><\/li>\n<li><a href=\"\/#modele\">The Model and Lag\/Variable Convergence<\/a>\n<ol>\n<li><a href=\"\/#Selectionner_des_variables\">Selecting Variables<\/a><\/li>\n<li><a href=\"\/#Selection_du_lag_optimal\">Selecting the Optimal Lag<\/a><\/li>\n<\/ol>\n<\/li>\n<li><a href=\"\/#prediction\">Prediction with the Best Model and Result Visualization<\/a><\/li>\n<li><a href=\"\/#Conclusion\">Conclusion<\/a><\/li>\n<\/ol>\n<h3>1. Introduction<\/h3>\n<p><strong>Welcome to this explanatory article about the VAR model. We will delve into the &#8220;fundamentals&#8221; of this model but not the general aspects of time series (except for stationarity ?).<\/strong><\/p>\n<p>The dataset used is available <a href=\"https:\/\/www.kaggle.com\/datasets\/albertovidalrod\/electricity-consumption-uk-20092022?select=historic_demand_2009_2023_noNaN.csv\">here<\/a>. And here is my GitHub: <a href=\"https:\/\/github.com\/gambitl\/Article_VAR_FR_Public\">https:\/\/github.com\/gambitl\/Article_VAR_FR_Public<\/a><\/p>\n<p>By the end of this article, you will know:<\/p>\n<ul>\n<li>What the VAR model is and what its significance is ?<\/li>\n<li>How and why to prepare your <a href=\"https:\/\/liora.io\/dataset-definition\">dataset<\/a> to use the VAR model<\/li>\n<li>How to choose the right parameter (lag count) for your dataset to minimize an error metric, not just a statistical metric<\/li>\n<li>Why and how to test the relationship between variables and remove those that add noise to the model<\/li>\n<li>Automate the two previous steps and converge towards an optimal result that combines the most effective parameter and the most suitable dataset ?<\/li>\n<\/ul>\n<p>At each stage, I&#8217;ll highlight the theoretical points first, then the &#8220;in-house&#8221; functions, and finally the code that calls the functions. Each function is commented and explained within the code itself.<\/p>\n<p>At the end of the article, you&#8217;ll find all the non-functional code.<\/p>\n<p>In this example, I&#8217;m using the VAR model to predict ONE particular variable.<\/p>\n<h3>2. VAR model definition: equation &amp; interest<\/h3>\n<p>The Vector Autoregression (VAR) model is a time series model that predicts several variables over time. Each variable&#8217;s prediction contributes to the prediction of the others. So, if we were to write it in a linear format, we&#8217;d have this:<\/p>\n<p>Let there be 2 variables, Y and X, and (t) a moment in time.<\/p>\n<p>Y(t+1) = f(Y(t), X(t))<\/p>\n<p>X(t+1) = f(X(t), Y(t))<\/p>\n<p>Y(t+1) = \u03b11 + \u03b21Y(t) + \u03b22X(t) + \u03b51<\/p>\n<p>X(t+1) = \u03b12 + \u03b23X(t) + \u03b24Y(t) + \u03b52<\/p>\n<p>With \u03b1 the constants, \u03b2 the coefficients associated with the variables, \u03b5 the errors.<\/p>\n<p>Based on historical data, the model calculates the constants and coefficients for each variable.<\/p>\n<p>In addition, the model includes &#8220;lags&#8221; (\u03bb), which allow all values of X, Y up to X(t- \u03bb), Y(t- \u03bb) to be included in the calculation of X(t+1), Y(t+1). In the previous system, the next value of the variable (X) is calculated as a function of the previous value of (X) and the previous value of (Y). The lag (\u03bb) is therefore equal to 1.<\/p>\n<p>It is these lags that constitute the parameter to be optimized in this model, i.e. how many days of history will make up our equations.<\/p>\n<h3>3. Import from booksellers, set-up<\/h3>\n<p>We need the following imports:<\/p>\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>import pandas as pd\nimport numpy as np\nfrom sklearn.model_selection import TimeSeriesSplit\nfrom sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error\nfrom sklearn.preprocessing import MinMaxScaler\nimport seaborn as sns\nimport matplotlib.pyplot as plt\nimport matplotlib as mpl\nfrom statsmodels.tsa.vector_ar.var_model import VAR\nfrom statsmodels.tsa.stattools import grangercausalitytests, adfuller\nfrom scipy.stats import boxcox, norm, probplot, normaltest\nfrom scipy.special import inv_boxcox<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n<p>Loading data :<\/p>\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>df = pd.read_csv(\"path.csv\")\ndf['settlement_date'] = pd.to_datetime(df['settlement_date'])\ndf = df.set_index('settlement_date')\ndf = df.drop(['settlement_period', 'period_hour'], axis = 1)\nexog = df[['is_holiday']].copy()\ndf = df.drop(['is_holiday'], axis =1)<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n<p>Parameters :<\/p>\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>target_variable = \"tsd\" #the feature we ultimately want to predict\nn_period_predict = 48 * 7 #the number of period we want to predict. I want to predict 7 days and there are 48 data points per day<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n<p>Preparations\/Cleaning :<\/p>\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>mask = (df.index > '2022-01-01') #I will use the data post 2022\ndf_pour_var = df.loc[mask]\nexog_pour_var = exog.loc[mask]\ndf_pour_var = df_pour_var +1 #We erase the every value that are equals to 0 in order to facilitate the boxcox transformation\ninitial_testing_set = df_pour_var[-n_period_predict:] #there will be a lot of transformation so we want to keep a safe testing set<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n<h3>4. Stationarity: what it is and why it&#8217;s importan<\/h3>\n<p>We define (weak) stationarity as follows:<\/p>\n<p>Whatever the moment (t) in time, the expectation of our variable is always the same. There is no trend.<br \/>Whatever the moment (t) in time, the variance of our variable is always the same and not infinite.<br \/>Whatever the moment in time, the auto-correlation between two moments in time is a function only of the time lag (and position does not change the value of the auto-correlation).<\/p>\n<p>Without going into detail, it has been shown that regression on a non-stationary variable even has invalid parameters (they follow a Brownian motion, i.e. very irregular, for simplicity&#8217;s sake). I recommend this feed on researchgate for more details.<\/p>\n<p>The first step in creating a time series model is to stationarize the series. There are several methods for doing this, all of which involve transforming the data:<\/p>\n<p>The various transformation methods for normalizing our data, such as MinMaxScaler()<br \/>The Boxcox transformation, widely used and common in the case of time series.<br \/>Differentiation, which consists in subtracting the previous value from a value in the series. In my experience, it&#8217;s rare for a series to be non-stationary after one or two differentiations. Note that in ARIMA\/ARMA models, the library usually takes care of differentiating the series itself, but this is not the case here ?toujours la m\u00eame. There is no trend.<br \/>Whatever the moment (t) in time, the variance of our variable is always the same and not infinite.<br \/>Whatever the moment in time, the auto-correlation between two moments in time is a function only of the time lag (and position does not change the value of the auto-correlation).<\/p>\n<p>Without going into detail, it has been shown that regression on a non-stationary variable even has invalid parameters (they follow a Brownian motion, i.e. very irregular, for simplicity&#8217;s sake). I recommend this feed on researchgate for more details.<\/p>\n<p>Ideally, we&#8217;d like to transform a series using Boxcox, then differentiate it if required.<\/p>\n<p>Unfortunately, not all series can be transformed using Boxcox (negative values). I therefore propose the following code, which will test the set:<\/p>\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>def box_cox_test(dataframe_to_test):\n    \"\"\"_summary_\n    Args:\n        dataframe_to_test (dataframe)\n    Returns:\n        list_box_cox : the list of values you can transform thanks to boxcox\n        list_non_box_cox : the list of values you CAN'T transform thanks to boxcox\n    \"\"\"\n    list_non_transformed = []\n    list_box_cox = []\n    for col in dataframe_to_test:\n        data = dataframe_to_test[col]\n        if (data < 0).values.any():\n            list_non_transformed.append(col)\n            continue\n        fig = plt.figure()\n        ax1 = fig.add_subplot(221)\n        prob = probplot(data, dist=norm, plot=ax1)\n        ax1.set_xlabel('')\n        ax1.set_title('Probplot against normal distribution')\n        ax21 = fig.add_subplot(222)\n        ax21.hist(data)\n        ax21.set_xlabel(col)\n        ax21.set_title('distplot')\n        # normality test\n        stat, p = normaltest(data)\n        print('Statistics=%.3f, p=%.3f' % (stat, p))\n        # interpret\n        alpha = 0.05\n        if p > alpha:\n            print('Sample looks Gaussian (fail to reject H0)')\n        else:\n            print('Sample does not look Gaussian (reject H0)')\n        xt, _ = boxcox(data)\n        if not np.isfinite(xt).any():\n            list_non_transformed.append(col)\n            continue\n        list_box_cox.append(col)\n        ax2 = fig.add_subplot(223)\n        prob = probplot(xt, dist=norm, plot=ax2)\n        ax2.set_title('Probplot after Box-Cox transformation')\n        ax22 = fig.add_subplot(224)\n        ax22.hist(xt)\n        ax22.set_xlabel(col)\n        ax22.set_title('distplot')\n        stat, p = normaltest(xt)\n        print('Statistics=%.3f, p=%.3f' % (stat, p))\n        # interpret\n        alpha = 0.05\n        if p > alpha:\n            print('Sample looks Gaussian (fail to reject H0)')\n        else:\n            print('Sample does not look Gaussian (reject H0)')\n        plt.subplots_adjust(top = 0.99, bottom=0.01, hspace=1.2, wspace=0.4)\n        plt.show()\n    print('*'*50 +'n' +'*'*50)\n    print(f\"Les donn\u00e9es suivantes n'ont pas pu \u00eatre scaled (n\u00e9gatives, infinites) : {list_non_transformed}. n Voici leur plot non transform\u00e9\")\n    for col in list_non_transformed:\n        data = dataframe_to_test[col]\n        fig = plt.figure()\n        ax1 = fig.add_subplot(211)\n        prob = probplot(data, dist=norm, plot=ax1)\n        ax1.set_xlabel('')\n        ax1.set_title('Probplot against normal distribution')\n        ax21 = fig.add_subplot(212)\n        ax21.hist(data)\n        ax21.set_xlabel(col)\n        ax21.set_title('distplot')\n    return list_box_cox, list_non_transformed<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n<p>We use it to test our dataframe:<\/p>\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>list_features_to_box_cox, list_features_non_box_cox = box_cox_test(df_pour_var)<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n<p>We now have a list of variables that can be transformed via boxcox, and a list that cannot be transformed via boxcox.<\/p>\n<p>We now transform the variables that can be transformed via boxcox. I&#8217;ve coded a function to obtain a dictionary containing the &#8220;features &#8211; lambda&#8221; associations used for the transformation. This will come in handy for what&#8217;s to come!<\/p>\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>def transfo_boxcox(dataframe_to_transform, list_features_to_transform):\n    \"\"\"_summary_\n    Args:\n        dataframe_to_transform (dataframe)\n        list_features_to_transform (list)\n    Returns:\n        transformed (dataframe): the dataframe after being transformed\n        dict_lambda(dict): the dict containing in key the name of a feature and in value the lambda used for transforming it via boxcox\n    \"\"\"\n    dict_lambda = {}\n    transformed = dataframe_to_transform.copy()\n    for ft in list_features_to_transform:\n        transformed[ft], l = boxcox(dataframe_to_transform[ft])\n        dict_lambda[ft] = l\n    return transformed, dict_lambda<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>df_var_transformed, dict_lambda_bc = transfo_boxcox(df_pour_var, list_features_to_box_cox)<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>#We apply a minmaxscaler transform to the data we could not transform thanks to boxcox\nminmaxscaler = MinMaxScaler()\nminmaxscaler = minmaxscaler.fit(df_pour_var[list_features_non_box_cox])\ndf_var_transformed[list_features_non_box_cox] = minmaxscaler.transform(df_pour_var[list_features_non_box_cox])<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n<p>It&#8217;s time to test our features and find out if they&#8217;re stationary. Here&#8217;s the test function:<\/p>\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>def dickey_fuller_test(data, seuil_signif=0.05, name='', verbose=False):\n    \"\"\"A function tests whether our features are stationary or not.\n    Args:\n        data (series): the data to test\n        seuil_signif (float, optional): test level. Defaults to 0.05.\n        name (str, optional): _description_. Defaults to ''.\n        verbose (bool, optional): _description_. Defaults to False.\n    \"\"\"\n    result = adfuller(data, autolag='AIC')\n    output = {'Test statistic':round(result[0], 4), 'p_value':round(result[1], 4), 'n_lags':round(result[2], 4), 'n_obs':result[3]}\n    p_value = output['p_value']\n    # Print Summary\n    print(f'    Augmented Dickey-Fuller test on : \"{name}\"', \"n   \", '~'*47)\n    print(f' H0: the data shows a unit root, and then is not stationary.')\n    print(f' P-Value equals      : {p_value}')\n    print(f' Confidence level    = {seuil_signif}')\n    print(f' Test statistic   = {output[\"Test statistic\n\"]}')\n    print(f' Number of lags = {output[\"n_lags\"]}')\n    if p_value <= seuil_signif:\n        print(f\" => P-Value = {p_value} <= {seuil_signif}. We can reject H0 on the confidence level.\")\n        print(f\" => The serie is stationary.\")\n    else:\n        print(f\" => P-Value = {p_value} > {seuil_signif}. We can not reject H0 on the confidence level.\")\n        print(f\" => The serie is NOT stationary.\")<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n<p>And the test in question:<\/p>\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>for name, column in df_var_transformed.iteritems():\n    dickey_fuller_test(column, name=column.name)\n    print('n')<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n<p>Unfortunately, some series are not stationary. So we&#8217;ll differentiate the series and repeat the test:<\/p>\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>df_diff_1_transformed = df_var_transformed.diff().dropna()\nexog_pour_var = exog_pour_var[1:] #we need to offset of 1 because we dropped the first value by differencing\nfor name, column in df_diff_1_transformed.iteritems():\n    dickey_fuller_test(column, name=column.name)\n    print('n')<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n<p>Now everything&#8217;s stationary! We can now move on to training the model.<\/p>\n<h3>5. The model and lag\/variable convergence<\/h3>\n<h4>a. Select variables<\/h4>\n<p>When we run a more &#8220;classic&#8221; classification or regression model, we always test the relevance of the input variables, i.e. the quality of the information provided by these variables to predict the target variable. In a time series exercise, it&#8217;s the same thing: a variable may not explain the target variable at all\/too little. So you have to decide which ones to keep, which ones to delete. ?<\/p>\n<p>In our case, there&#8217;s a test, the Granger test, that allows us to finish whether a variable helps explain the predictions of another variable. This is exactly what we&#8217;re looking for ?<\/p>\n<p>Here&#8217;s the test function:<\/p>\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>def grangers_test(data, lag, test='ssr_chi2test',verbose=False):    \n    \"\"\"_summary_\n        The values inside the dataframe are the p-value of the test for the couple feature X\/Y\n    H0 hypothesis is the following :\n        \"Forecasts of X does not influence forecasts on Y\"\n    Therefor, p-value under 0.005 threshold allows us to reject H0 with a confidence level of 95% and pushs us to keep this couple of variables\nLes arguments sont :\n        data (dataframe): dataframe that holds the value of each feature\n        lag (int): the lag to apply to the test. Must be equal to the lag used in the model\n        test (str, optional): the test to apply. Defaults to 'ssr_chi2test'.\n        verbose (bool, optional):  Defaults to False.\n    Returns:\n        the dataframe with p-value\n    \"\"\"\n    df = pd.DataFrame(np.zeros((len(data.columns), len(data.columns))), columns=data.columns, index=data.columns)\n    for col in df.columns:\n        for row in df.index:\n            test_result = grangercausalitytests(data[[row, col]], maxlag=[lag], verbose=False)\n            p_values = round(test_result[lag][0][test][1],4) #on va avoir toutes les p-valeurs une part lag\n            p_value = np.min(p_values) #On s'int\u00e9resse \u00e0 la valeur minimale des p-valeur\n            df.loc[row, col] = p_value\n    df.columns = [var for var in data.columns]\n    df.index = [var + '_Y' for var in data.columns]\n    return df<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n<p>In addition to the function that tests, we set up a function that will reduce our initial dataset, depending on the results of the test: we test the dataframe, delete the variable with the highest sum of p-values, and iterate until all variables pass the test.<\/p>\n<p>Note that you can enter a &#8220;targeted feature&#8221; which allows you never to delete a particular variable you wish to target. As a return value, the function returns the dataframe reduced in dimensionality. All that remains are variables whose predictions help to predict each of the other variables:<\/p>\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>def reduce_dim_granger_test(df_to_reduce, lag_to_apply, targeted_feature = None):\n    \"\"\"We reduce the dataset thanks to the granger test\n        We test that every variables in the dataframe are usefull to predict each other, and if not, we drop them\n        After each iteration of \"features optimization\", we run again the model fit in order to verify that the best lag_order has not changed.\n        If it as, hwe run the granger test optimization again\n    Args:\n        df_to_reduce (dataframe to reduce): the data frame we want to reduce\n        lag_to_apply (int): the current lag order of the model we use in that precise iteration\n        targeted_feature(string)\n    Returns:\n        _type_: we return the dataframe that has been reduced\n    \"\"\"\n    df_granger = grangers_test(df_to_reduce, lag = lag_to_apply)\n    np.fill_diagonal(df_granger.values, 0)\n    #plt.figure(figsize = (16,5))\n    #sns.heatmap(df_granger, annot = True)\n    #plt.show()\n    while (df_granger.values>= 0.05).any():\n        list_feature_non_granger_causal = []\n        list_value_non_granger_causal = []\n        for col in df_granger.columns:\n            if (df_granger[col].values >= 0.05).any() and col != targeted_feature:\n                list_feature_non_granger_causal.append(col)\n                list_value_non_granger_causal.append(sum(df_granger[col].values))\n        df_feature_to_pop = pd.DataFrame(list_value_non_granger_causal, index = list_feature_non_granger_causal, columns=['p-value'])  \n        df_to_reduce = df_to_reduce.drop(df_feature_to_pop['p-value'].idxmax(), axis = 1)\n        df_granger = grangers_test(df_to_reduce, lag = 5)\n        np.fill_diagonal(df_granger.values, 0)\n        #plt.figure(figsize = (16,5))\n        #sns.heatmap(df_granger, annot = True)\n    return df_to_reduce<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n<p>We can see that the function uses a parameter called &#8220;lag&#8221;: this is the parameter of our model we mentioned earlier. Depending on the lag selected, the test will not return the same values. So it&#8217;s very important to specify this lag as a function of the model lag&#8230; This is where things get a little complicated.<\/p>\n<h4>b. Optimum lag selection<\/h4>\n<p>We want to find the lambda value that minimizes the chosen error on the chosen variable. To do this, we can either use a method available in the statsmodels package, or use an in-house function when the library&#8217;s metrics don&#8217;t suit us.<\/p>\n<p>In fact, statsmodels can only optimize a VAR model on statistical metrics. However, it&#8217;s not necessarily a good idea to optimize only statistical metrics (AIC, BIC). In our case, we&#8217;d rather optimize the MAE. To do this, we&#8217;ll need to calculate our target metric for each of the iteration&#8217;s lambda values:<\/p>\n<p>For a given lag (lambda) level, we need to test the relevance of the variables using our granger test and, if necessary, eliminate unnecessary features.<br \/>Have a function that will de-transform our prediction (as a reminder, we currently work with variables that are transformed via boxcox, minmaxscaler, then differentiated!)<br \/>Have a predict function<br \/>Error measurement functions<br \/>Remember the best model to date for the chosen metric, in this case MAE.<\/p>\n<p>In this example, my models are tested each time on the variable I&#8217;ve set as my target (tsd). So I&#8217;m not measuring (and optimizing) according to the algorithm&#8217;s ability to predict the set of variables well.<\/p>\n<p>We&#8217;ll need a predict function:<\/p>\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>def predict_results(model_fitted, training_set, exog_for_predicting):\n    \"\"\"function to predict the forecasts\n    Args:\n        model_fitted (var model): the fitted model\n        training_set (dataframe): the training set  \n        exog_for_predicting (dataframe): exog on the index of predicting\n    Returns:\n        dataframe : returns three dataframe : the actual predictions, the upper confidence interval, the lower confidence interval\n    \"\"\"\n    lag_order = model_fitted.k_ar\n    input_data = training_set.values[-lag_order:]\n    y_predicted = model_fitted.forecast(y = input_data, steps = len(exog_for_predicting), exog_future=exog_for_predicting)\n    df_predicted_interval = model_fitted.forecast_interval(y = input_data, steps = len(exog_for_predicting), exog_future=exog_for_predicting)\n    interval_upper = pd.DataFrame(df_predicted_interval[2],  index=exog_for_predicting.index, columns=training_set.columns)\n    interval_lower = pd.DataFrame(df_predicted_interval[1], index=exog_for_predicting.index, columns=training_set.columns)\n    df_predicted = pd.DataFrame(y_predicted, index=exog_for_predicting.index, columns=training_set.columns)\n    return df_predicted, interval_upper, interval_lower<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n<p>Then re-transform the variables, first by reversing the differentiation:<\/p>\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>def inv_diff(df_orig, df_forecast, second_diff = False):\n    \"\"\"invert the differentiation of a dataframe\n    Args:\n        df_orig (dataframe): the dataframe that existed JUST BEFORE differencing\n        df_forecast (dataframe): the dataframe with the forecasted values\n        second_diff (bool, optional): Did you apply two diff order ?\n    Returns:\n        dataframe with the diff reverted\n    \"\"\"\n    columns = df_forecast.columns\n    df_fc_inv = df_forecast.copy()\n    n_jour_cible = len(df_forecast)\n    for col in columns:\n        if second_diff:\n            df_fc_inv[str(col)+'_1d'] = (df_orig[col].iloc[-n_jour_cible-1]-df_orig[col].iloc[-n_jour_cible-2]) + df_fc_inv[str(col)].cumsum()\n            df_fc_inv[str(col)] = df_orig[col].iloc[-n_jour_cible-1] + df_fc_inv[str(col)+\"_1d\"].cumsum()\n        else:\n            df_fc_inv[str(col)] = df_orig[col].iloc[-n_jour_cible-1] + df_fc_inv[str(col)].cumsum()\n    return df_fc_inv<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n<p>Then invert the boxcox &amp; minmaxscaler transformations on the correct variables:<\/p>\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>def invert_transform(list_of_features_boxcox, list_of_features_normalized, scaler, predicted, initial_dataframe, dict_lambda_value):\n    \"\"\"invert the transformation boxcox and of a scaler of any kind\n    Args:\n        list_of_features_boxcox (list): list of features you transformed with Boxcox\n        list_of_features_normalized (list): list of features you normalized\n        scaler (scaler objet): the scaler you innitiated for normalizing\n        predicted (dataframe): forecasted dataframe\n        initial_dataframe (dataframe: the very first dataframe, not transformed\n        dict_lambda_value (_type_): the dict  in we stored the lambdas values of each feature transformed via boxcox\n    Returns:\n        the predicted dataframe with true destransformed value\n    \"\"\"\n    list_of_features_boxcox = list(set(list_of_features_boxcox) & set(predicted.columns))\n    list_of_features_normalized = list(set(list_of_features_normalized) & set(predicted.columns))\n    scaler = scaler.fit(initial_dataframe[list_of_features_boxcox])\n    predicted[list_of_features_normalized] = scaler.inverse_transform(predicted[list_of_features_normalized])\n    for col in list_of_features_boxcox:\n        predicted[col]=inv_boxcox(predicted[col], dict_lambda_value[col]).values\n    return predicted<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n<p>And last but not least, a number of custom functions for measuring errors:<\/p>\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>def r2_ajusted(r2, df):\n    longueur = len(df.index)\n    nb_col = len(df.columns)\n    score_ajusted = 1-((1-r2)*(longueur-1) \/ (longueur - nb_col -1))\n    return score_ajusted<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>def mape(y_test, pred):\n    y_test, pred = np.array(y_test), np.array(pred)\n    mape = np.mean(np.abs((y_test - pred) \/ y_test))\n    return mape<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n<p>And finally we have our optimization function, which records the metrics of each iteration:<\/p>\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>def convergence_lag_features(transformed_diff_dataframe, transformed_dataframe, initial_dataframe, list_of_features_boxcox, list_of_features_normalized, scaler, dict_lambda_value,\n                             exog, maxlag_to_converge, n_period, targeted_feature):\n    \"\"\"We will optimize the features according to the chosen lag until the lag and features do not change\n    Args:\n        initial_transforemd_diff_dataframe (Dataframe): The initial dataframe you want to perform the convergence on. It is the dataframe you want the data\n        to be trained on, so it must be transformed, differenciated if needed...\n        transformed_dataframe (Dataframe): The initial dataframe that is transformed but not yet differenciated\n        exog (pandas dataframe): the exogeneous features you give to your model. Must be known for train and test ! Good example is holidays,\n        Fourier features...\n        maxlag_to_converge (int): the maximum number of lags to include in your model. The code will test them all so the bigger the longer\n        the training time ! Be careful as it is exponential UwU\n        n_period (int): number of time periods (days, hours...) you want to predict\n        targeted_feature: the main feature you want to predict\n    Returns:\n    best_model (model object) : the best fitted model among all p\n    training_set (pandas dataframe) : the training set being granger tested and reduced\n    training_set_exog (pandas dataframe) : the training set exog\n    testing_set (pandas dataframe) : the test set being granger tested and reduced\n    test_set_exog (pandas dataframe) : the testing set exog\n    transformed_dataframe_updated (pandas dataframe) :\n    training_metrics\n    initial_dataframe_uptaded\n    \"\"\"\n    mae_training_loop = []\n    MAPE_training_loop = []\n    aic_training_loop = []\n    r2_training_loop= []\n    normality_training_loop = []\n    whiteness_training_loop = []\n    best_model = None\n    best_df_diff = None\n    training_set_exog = exog[:-n_period]\n    test_set_exog = exog[-n_period:]\n    initial_testing_set = initial_dataframe[-n_period:]\n    for i in range(1, maxlag_to_converge):\n        df_diff = transformed_diff_dataframe.copy()\n        df_diff= reduce_dim_granger_test(df_diff, i, targeted_feature)\n        training_set=df_diff[:-n_period]\n        testing_set = df_diff[-n_period:]\n        model = VAR(endog = training_set, exog = training_set_exog)\n        model = model.fit(i)\n        predict, predict_upper, predict_lower = predict_results(model, training_set, test_set_exog)\n        initial_testing_set_updated = initial_testing_set[predict.columns]\n        initial_dataframe_uptaded = initial_dataframe[predict.columns]\n        predict = inv_diff(transformed_dataframe, predict)\n        predict = invert_transform(list_of_features_boxcox, list_of_features_normalized, scaler, predict, initial_dataframe, dict_lambda_value)\n        mae = mean_absolute_error(initial_testing_set_updated[targeted_feature], predict[targeted_feature])\n        if len(mae_training_loop) == 0 or min(mae_training_loop) > mae:\n            best_model = model\n            best_df_diff = df_diff\n        mae_training_loop.append(mae)\n        MAPE_training_loop.append(mape(initial_testing_set_updated[targeted_feature], predict[targeted_feature]))\n        r2_training_loop.append(r2_ajusted(r2_score(initial_testing_set_updated[targeted_feature], predict[targeted_feature]), predict))\n        aic_training_loop.append(model.aic)\n        normality_training_loop.append(model.test_normality().pvalue)\n        whiteness_training_loop.append(model.test_whiteness(round((len(training_set))\/5)).pvalue)\n    training_metrics = pd.DataFrame()\n    training_metrics['mae'] = mae_training_loop\n    training_metrics['mape'] = MAPE_training_loop\n    training_metrics['r2'] = r2_training_loop\n    training_metrics['aic'] = aic_training_loop\n    training_metrics['normality'] = normality_training_loop  #H0: distribution is gaussian. if pvalue < 0.05 we reject and that is not good\n    training_metrics['whiteness'] =  whiteness_training_loop   #H0: there is no presence of autocorrelation. if pvalue < 0.05 we reject and that is not good\n    training_set=best_df_diff[:-n_period]\n    testing_set = best_df_diff[-n_period:]\n    initial_dataframe_uptaded = initial_dataframe[best_df_diff.columns]\n    transformed_dataframe_updated = transformed_dataframe[best_df_diff.columns]\n    return best_model, training_set, training_set_exog, testing_set, test_set_exog, transformed_dataframe_updated, training_metrics, initial_dataframe_uptaded<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n<p>And finally we have our optimization function, which records the metrics of each iteration:<\/p>\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>model_var, train_var, train_var_exog, test_var, test_var_exog, df_var_transformed, df_optimizing_metrics, df_var = convergence_lag_features(df_diff_1_transformed, df_var_transformed, df_var, list_features_to_box_cox, list_features_non_box_cox,\n                                                                                                                                      minmaxscaler, dict_lambda_bc, exog_pour_var, 52, n_period_predict, target_variable)<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n<p>And finally we have our optimization function, which records the metrics of each iteration:<\/p>\n<h3>6. Prediction with the best model and display of results<\/h3>\n<p>Now that we have our best optimized model with the training\/testing set also optimized in terms of choice of variables, with also the initial dataframe with the selected variables, we can predict, invert the transformations (I predict at the same time the confidence intervals, and I also de-transform them):<\/p>\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>df_predicted, df_interval_lower, df_interval_upper = predict_results(model_var, train_var, test_var_exog)\ndf_predicted = inv_diff(df_var_transformed, df_predicted)\ndf_interval_lower = inv_diff(df_var_transformed, df_interval_lower)\ndf_interval_upper = inv_diff(df_var_transformed,df_interval_upper)\ndf_predicted = invert_transform(list_features_to_box_cox, list_features_non_box_cox, minmaxscaler,\n                                df_predicted, df_var, dict_lambda_bc)\ndf_interval_lower = invert_transform(list_features_to_box_cox, list_features_non_box_cox, minmaxscaler,\n                                df_interval_lower, df_var, dict_lambda_bc)\ndf_interval_upper = invert_transform(list_features_to_box_cox, list_features_non_box_cox, minmaxscaler,\n                                df_interval_upper, df_var, dict_lambda_bc)<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n<p>Finally, I plot my true vs pred graphs:<\/p>\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>def append_axes(fig, as_cols=False):\n    \"\"\"Append new Axes to Figure.\"\"\"\n    n = len(fig.axes) + 1\n    nrows, ncols = (1, n) if as_cols else (n, 1)\n    gs = mpl.gridspec.GridSpec(nrows, ncols, figure=fig)\n    for i,ax in enumerate(fig.axes):\n        ax.set_subplotspec(mpl.gridspec.SubplotSpec(gs, i))\n    return fig.add_subplot(nrows, ncols, n)\ndef plot_pred_vs_true(true_value, predicted, lower_int, upper_int, normality_of_residual = False):\n    \"\"\"_summary_\n    The function will plot every true column of the dataframe against pred\n    Args:\n        true_value (dataframe): df containing the true value\n        predicted (dataframe): df of the predictions\n        lower_int (dataframe) : df with the lower interval values\n        upper_int (dataframe) : df with the upper interval values\n        normality_of_residuals : are the residuals following a gaussian ? if not, interval will be strange \/EVIL LAUGHTER\/\n    \"\"\"\n    sns.set(style=\"ticks\", context=\"talk\")\n    plt.style.use(\"dark_background\")\n    plt.rc('legend',**{'fontsize':25})\n    fig = plt.figure(layout='constrained', figsize=(40,70))\n    col = predicted.columns\n    predicted.columns = predicted.columns+'_pred'\n    lower_int.columns =lower_int.columns+'_low_int'\n    upper_int.columns = upper_int.columns+'_up_int'\n    predicted.plot(subplots=True, ax=append_axes(fig), color='r', fontsize=30, legend=True)\n    if normality_of_residual:\n        lower_int.plot(subplots=True, ax=fig.axes, color='r', fontsize=30, legend=True, linestyle ='-.')\n        upper_int.plot(subplots=True, ax=fig.axes, color='r', fontsize=30, legend=True, linestyle ='-.')\n    true_value[col].plot(subplots=True, ax=fig.axes, color='b', fontsize=30, legend=True)\n    plt.show()<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>plot_pred_vs_true(initial_testing_set, df_predicted, df_interval_lower, df_interval_lower, normality_of_residual=False)<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n<h3>7. Conclusion<\/h3>\n<p>In this exercise, we achieve a MAE of 3068, a MAPE of 9.4% on our target variable ?. Here&#8217;s the true\/pred plot for the target variable &#8216;tsd&#8217; :<\/p>\n<p>If my residuals don&#8217;t pass the normality test (do the residuals follow a Gaussian?) then I risk ending up with completely false confidence intervals! So don&#8217;t count on it. There are methods for making residuals Gaussian, but this will be the subject of a future article ?<\/p>\n<p>We can also note that in terms of code elegance, we could do much better with object-oriented code and by creating a new class that would build on statsmodels&#8217; VAR model.<\/p>\n<p>To conclude, here&#8217;s the code executed from A to Z apart from the functions:<\/p>\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>df = pd.read_csv(\"path.csv\")\ndf['settlement_date'] = pd.to_datetime(df['settlement_date'])\ndf = df.set_index('settlement_date')\ndf = df.drop(['settlement_period', 'period_hour'], axis = 1)\nexog = df[['is_holiday']].copy()\ndf = df.drop(['is_holiday'], axis =1)<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>target_variable = \"tsd\" #the feature we ultimately want to predict\nn_period_predict = 48 * 7 #the number of period we want to predict. I want to predict 7 days and there are 48 data points per day<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>mask = (df.index > '2022-01-01') #I will use the data post 2022\ndf_var = df.loc[mask]\nexog_pour_var = exog.loc[mask]\ndf_var = df_var +1 #We erase the every value that are equals to 0 in order to facilitate the boxcox transformation<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>initial_testing_set = df_var[-n_period_predict:] #there will be a lot of transformation so we want to keep a safe testing set<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>list_features_to_box_cox, list_features_non_box_cox = box_cox_test(df_var)<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>df_var_transformed, dict_lambda_bc = transfo_boxcox(df_var, list_features_to_box_cox)<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>#We apply a minmaxscaler transform to the data we could not transform thanks to boxcox\nminmaxscaler = MinMaxScaler()\nminmaxscaler = minmaxscaler.fit(df_var[list_features_non_box_cox])\ndf_var_transformed[list_features_non_box_cox] = minmaxscaler.transform(df_var[list_features_non_box_cox])<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>for name, column in df_var_transformed.iteritems():\n    dickey_fuller_test(column, name=column.name)\n    print('n')<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>df_diff_1_transformed = df_var_transformed.diff().dropna()\nexog_pour_var = exog_pour_var[1:] #we need to offset of 1 because we dropped the first value by differencing\nfor name, column in df_diff_1_transformed.iteritems():\n    dickey_fuller_test(column, name=column.name)\n    print('n')<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>model_var, train_var, train_var_exog, test_var, test_var_exog, df_var_transformed, df_optimizing_metrics, df_var = convergence_lag_features(df_diff_1_transformed, df_var_transformed, df_var, list_features_to_box_cox, list_features_non_box_cox,\n                                                                                                                                      minmaxscaler, dict_lambda_bc, exog_pour_var, 52, n_period_predict, target_variable)<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>print(df_optimizing_metrics.describe())<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>df_predicted, df_interval_lower, df_interval_upper = predict_results(model_var, train_var, test_var_exog)<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>plot_pred_vs_true(initial_testing_set, df_predicted, df_interval_lower, df_interval_lower, normality_of_residual=False)<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Introduction Definition of the VAR Model: Equation &amp; Significance Library Import and Setup Stationarity: What Is It and Why Is It Important The Model and Lag\/Variable Convergence Selecting Variables Selecting the Optimal Lag Prediction with the Best Model and Result Visualization Conclusion 1. Introduction Welcome to this explanatory article about the VAR model. We will [&hellip;]<\/p>\n","protected":false},"author":78,"featured_media":172221,"comment_status":"open","ping_status":"open","sticky":false,"template":"elementor_theme","format":"standard","meta":{"_acf_changed":false,"editor_notices":[],"footnotes":""},"categories":[2433],"class_list":["post-172220","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-ai"],"acf":[],"_links":{"self":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/172220","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/users\/78"}],"replies":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/comments?post=172220"}],"version-history":[{"count":1,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/172220\/revisions"}],"predecessor-version":[{"id":206307,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/172220\/revisions\/206307"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/media\/172221"}],"wp:attachment":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/media?parent=172220"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/categories?post=172220"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}