This course will cover topics at the intersection of machine learning and econometrics. Using Breiman’s (2001) notion of two cultures in the use of statistical modelling, we start with a review of the fundamental differences between machine learning and econometrics. In this context we contrast a modelling approach where the analyst makes certain assumption on model specification, including functional form, with an approach where the data mechanism is presumed unknown. In this context we consider the econometrician’s concern for internal and external validity, alongside the focus within machine learning of ensuring that a model is robust in the sense of generalising to unseen data. We also examine the distinction between models used to solve a prediction problem, and models which are used to estimate some form of causal effect.
In covering the three broad areas where machine learning is used, namely prediction, classification and causal effects, for each case we link the exposition to a parametric benchmark. So for prediction we consider the piecewise nonlinear regression model, for classification we review the fundamentals of parametric binary choice models, and for causal effects we look at specification of models of treatment effects.
We will also cover the use of ensemble methods as an averaging and regularization device. In this context we will explore a number of general methods for model averaging including bootstrap sampling (so-called bagging) and random forests.
For Machine Learning models in prediction, classification and causal effects we provide examples using Stata and Python. We also demonstrate the integration of Python code in Stata.