Towards Trustworthy Predictions from Deep Neural Networks with Fast Adversarial Calibration


To facilitate a wide-spread acceptance of AI systems guidingdecision making in real-world applications, trustworthinessof deployed models is key. That is, it is crucial for predic-tive models to be uncertainty-aware and yield well-calibrated(and thus trustworthy) predictions for both in-domain sam-ples as well as under domain shift. Recent efforts to accountfor predictive uncertainty include post-processing stepsfortrained neural networks, Bayesian neural networks as wellas alternative non-Bayesian approaches such as ensemble ap-proaches and evidential deep learning. Here, we propose anefficient yet general modelling approach for obtaining well-calibrated, trustworthy probabilities for samples obtained af-ter a domain shift. We introduce a new training strategy com-bining an entropy-encouraging loss term with an adversar-ial calibration loss term and demonstrate that this resultsinwell-calibrated and technically trustworthy predictionsfora wide range of domain drifts. We comprehensively evalu-ate previously proposed approaches on different data modal-ities, a large range of data sets including sequence data, net-work architectures and perturbation strategies. We observethat our modelling approach substantially outperforms exist-ing state-of-the-art approaches, yielding well-calibrated pre-dictions under domain drift.

In Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI 21)
Florian Buettner
Florian Buettner
Professor for Bioinformatics in Oncology