This paper analyzes multi-period mortgage risk at loan and pool levels using an unprecedented dataset of over 120 million prime and subprime mortgages originated across the United States between 1995 and 2014, which includes the individual characteristics of each loan, monthly updates on loan performance over the life of a loan, and a number of time-varying economic variables at the zip code level. We develop, estimate, and test dynamic machine learning models for mortgage prepayment, delinquency, and foreclosure which capture loan-to-loan correlation due to geographic proximity and exposure to common risk factors. The basic building block is a deep neural network which addresses the nonlinear relationship between the explanatory variables and loan performance. Our likelihood estimators, which are based on 3.5 billion borrower-month observations, indicate that mortgage risk is strongly influenced by local economic factors such as zip-code level foreclosure rates. The out-of-sample predictive performance of our deep learning model is a significant improvement over linear models such as logistic regression. Model parameters are estimated using GPU parallel computing due to the computational challenges associated with the large amount of data. The deep learning model's superior accuracy compared to linear models directly translates into improved performance for investors. Portfolios constructed with the deep learning model have lower prepayment and delinquency rates than portfolios chosen with a logistic regression.
↧