Depending on the amount of data to process, file generation may take longer.

If it takes too long to generate, you can limit the data by, for example, reducing the range of years.

Article

Download BibTeX

Title

Dataset shift assessment measures in monitoring predictive models

Authors

[ 1 ] Wydział Techniczny, Akademia im. Jakuba z Paradyża | [ P ] employee

Scientific discipline (Law 2.0)

[2.3] Information and communication technology

Year of publication

2021

Published in

Procedia Computer Science

Journal year: 2021 | Journal volume: 192

Article type

scientific article / paper

Publication language

english

Keywords
EN
  • Dataset shiftPopulation Stability Index (PSI)Population Accuracy Index (PAI)monitoring of predictive model
Abstract

EN The article presents the results of the study, which is a fragment of the work carried out under the project entitled “Hybrid system for intelligent diagnostics of prognostic models”, co-financed through the National Centre for Research and Development from the European Regional Development Fund. They concerned the analysis of the phenomenon of dataset shift, also known as the shift of variable distributions. It is important to answer the question: has the distribution of current data for the implemented forecasting model changed significantly compared to the distribution of data used to develop it? If so, it could lead to incorrect operation. In the context of assessing and monitoring the stability of variable distributions of predictive models, the aim of the study was to compare the properties of two indicators, the Population Stability Index (PSI) and Population Accuracy Index (PAI). These measures were calculated for 78 controlled shifts of the distribution of the 3 explanatory variables of the hypothetical prognostic model. The research procedure was carried out in 2 scenarios. The first involved a comparison of PSI and PAI for the distributions of categorical variables. In scenario 2, an answer was sought to the question whether discretization of variables significantly influenced the assessment of the stability of their distributions using PSI compared to PAI, which does not require such a procedure? The results of the research proved that both indicators complement each other well, and when used together to assess the stability of the model’s variable distributions, they compensate each other’s shortcomings. PSI and PAI measure subtly different concepts of stability – PSI measures any change in the distribution of explanatory variables, and PAI only measures how this change affects the prognostic accuracy of the model – therefore they should be treated as complementary measures.

Date of online publication

2021

Pages (from - to)

3391 - 3402

DOI

10.1016/j.procs.2021.09.112

URL

https://www.sciencedirect.com/science/article/pii/S1877050921018512?via%3Dihub

Presented on

25th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems, 8-10.09.2021, Szczecin, Poland

License type

CC BY-NC-ND (attribution - noncommercial - no derivatives)

Open Access Mode

open journal

Open Access Text Version

final published version

Release date

09.2021 (Date presumed)

Date of Open Access to the publication

at the time of publication

Ministry points / journal

5

Ministry points / journal in years 2017-2021

5

Ministry points / conference (CORE)

70