

Current methods for building risk models assume averaged uniform effects across populations. They use weighted sums of individual risk factors from regression models with only a few interactions, such as age. This does not allow risk factor effects to vary in different morbidity contexts. This study modified a supervised Bayesian statistical learning method of topic modelling, allowing individual factors to have different effects depending on a patient’s other comorbidity. This study used topic modelling to assess more than 71,000 unique risk factors in a population cohort of 1.4 million adults within routine data. The model learnt prognostically important risk factor patterns that predicted 5 year survival, and the resulting model achieved excellent calibration and discrimination with a C statistic of 0.9 in a held out validation cohort. The model explained 92% of the observed variation in 5 year survival in the population. This paper validates using survival supervised Bayesian topic modelling within large routine electronic population health data to identify prognostically important risk factor patterns.