Model Building
How to replace with dictionary and excel file names:regres

purpose3 = pd.read_csv("purpose3.csv")


deet = {}
for r,v in zip(purpose3["previous"],purpose3["new"]):
    deet[r] = v

df_bor['loan_purpose3'] = df_bor['loan_purpose1'].map(deet)

Finance Python Reader

import pandas as pd
pd.core.common.is_list_like = pd.api.types.is_list_like
from pandas_datareader import data, wb
import fix_yahoo_finance as yf
import numpy as np
import datetime

You can add a start date too, thats left out here.
end = datetime.datetime(2016, 1, 1)
start = datetime.datetime(1990, 1, 1)
spx = data.get_data_yahoo('^GSPC',start,end)
vix = data.get_data_yahoo('^VIX',start,end)

You are always leaking when you are doing preprocessing before cross-validation. Easy because the train and test splits has been overcontaminated.

This pipeline makes a lot more sense. And should be used in the cross validation - I infact don’t see it used in cross validation in this mates work, so there might be an issue in his work.

flexible arguments *args ~> args basically means adding a list of things

def f(*args):
  for i in args: