Dataset shift assessment measures in monitoring predictive models
[ 1 ] Wydział Techniczny, Akademia im. Jakuba z Paradyża | [ P ] employee
2021
scientific article / paper
english
- Dataset shiftPopulation Stability Index (PSI)Population Accuracy Index (PAI)monitoring of predictive model
EN The article presents the results of the study, which is a fragment of the work carried out under the project entitled “Hybrid system for intelligent diagnostics of prognostic models”, co-financed through the National Centre for Research and Development from the European Regional Development Fund. They concerned the analysis of the phenomenon of dataset shift, also known as the shift of variable distributions. It is important to answer the question: has the distribution of current data for the implemented forecasting model changed significantly compared to the distribution of data used to develop it? If so, it could lead to incorrect operation. In the context of assessing and monitoring the stability of variable distributions of predictive models, the aim of the study was to compare the properties of two indicators, the Population Stability Index (PSI) and Population Accuracy Index (PAI). These measures were calculated for 78 controlled shifts of the distribution of the 3 explanatory variables of the hypothetical prognostic model. The research procedure was carried out in 2 scenarios. The first involved a comparison of PSI and PAI for the distributions of categorical variables. In scenario 2, an answer was sought to the question whether discretization of variables significantly influenced the assessment of the stability of their distributions using PSI compared to PAI, which does not require such a procedure? The results of the research proved that both indicators complement each other well, and when used together to assess the stability of the model’s variable distributions, they compensate each other’s shortcomings. PSI and PAI measure subtly different concepts of stability – PSI measures any change in the distribution of explanatory variables, and PAI only measures how this change affects the prognostic accuracy of the model – therefore they should be treated as complementary measures.
2021
3391 - 3402
CC BY-NC-ND (attribution - noncommercial - no derivatives)
open journal
final published version
09.2021 (Date presumed)
at the time of publication
5
5
70