Validating Default Models when the Validation Data are Corrupted: Analytic results and bias corrections
by Roger M. Stein of Massachusetts Institute of Technology & State Street Corporation
July 13, 2013
Abstract: There has been a growing recognition in industry that issues of data quality, which are routine in practice, can materially affect the assessment of credit model performance. In this paper, we develop some analytic results that are useful in sizing the biases associated with tests of default model power performed using corrupt (“noisy”) data. As it is sometimes unavoidable to test models with data that are known to be corrupt, we also provide some guidance on interpreting results of such tests. In some cases, with appropriate knowledge of the corruption mechanism, the true values of the performance statistics of interest may be recovered (in expectation), even when the underlying data have been corrupted. We also provide estimators of the standard errors of such recovered performance statistics. An analysis of the estimators reveals interesting behavior including the observation that “noisy” data does not “cancel out” across models even when the same corrupt data set is used to test multiple default models. Because our results are analytic, they may be applied in a broad range of settings and this can be done without the need for simulation experiments.
Keywords: ROC analysis, data noise, model errors, model validation, rating system, binary classification, calibration, discriminatory power.