We address the problem of uncertainty cal-ibration and introduce a novel calibrationmethod, Parametrized Temperature Scaling(PTS). Standard deep neural networks typi-cally yield uncalibrated predictions, whichcan be transformed into calibrated confidencescores using post-hoc calibration methods.In this contribution, we demonstrate that theperformance of accuracy-preserving state-of-the-art post-hoc calibrators is limited by theirintrinsic expressive power. We generalizetemperature scaling by computing prediction-specific temperatures, parameterized by aneural network. We show with extensive ex-periments that our novel accuracy-preservingapproach consistently outperforms existingalgorithms across a large number of modelarchitectures, datasets and metrics.
To facilitate a wide-spread acceptance of AI systems guidingdecision making in real-world applications, trustworthinessof deployed models is key. That is, it is crucial for predic-tive models to be uncertainty-aware and yield well-calibrated(and thus trustworthy) predictions for both in-domain sam-ples as well as under domain shift. Recent efforts to accountfor predictive uncertainty include post-processing stepsfortrained neural networks, Bayesian neural networks as wellas alternative non-Bayesian approaches such as ensemble ap-proaches and evidential deep learning. Here, we propose anefficient yet general modelling approach for obtaining well-calibrated, trustworthy probabilities for samples obtained af-ter a domain shift. We introduce a new training strategy com-bining an entropy-encouraging loss term with an adversar-ial calibration loss term and demonstrate that this resultsinwell-calibrated and technically trustworthy predictionsfora wide range of domain drifts. We comprehensively evalu-ate previously proposed approaches on different data modal-ities, a large range of data sets including sequence data, net-work architectures and perturbation strategies. We observethat our modelling approach substantially outperforms exist-ing state-of-the-art approaches, yielding well-calibrated pre-dictions under domain drift.
We address the problem of uncertainty calibration. While standard deep neural net-works typically yield uncalibrated predic-tions, calibrated confidence scores that arerepresentative of the true likelihood of a pre-diction can be achieved using post-hoc cali-bration methods. However, to date the focusof these approaches has been on in-domain calibration. Our contribution is two-fold.First, we show that existing post-hoc cali-bration methods yield highly over-confidentpredictions under domain shift. Second, weintroduce a simple strategy where perturba-tions are applied to samples in the valida-tion set before performing the post-hoc cal-ibration step. In extensive experiments, wedemonstrate that this perturbation step re-sults in substantially better calibration underdomain shift on a wide range of architecturesand modelling tasks.
Diagnosing diseases such as leukemia or anemia requires reliable counts of blood cells. Hematologists usually label and count microscopy images of blood cells manually. In many cases, however, cells in different maturity states are difficult to distinguish, and in combination with image noise and subjectivity, humans are prone to make labeling mistakes. This results in labels that are often not reproducible, which can directly affect the diagnoses. We introduce TIMELY, a probabilistic model that combines pseudotime inference methods with inhomogeneous hidden Markov trees, which addresses this challenge of label inconsistency. We show first on simulation data that TIMELY is able to identify and correct wrong labels with higher precision and recall than baseline methods for labeling correction. We then apply our method to two real-world datasets of blood cell data and show that TIMELY successfully finds inconsistent labels, thereby improving the quality of human-generated labels.