Advancing hip osteoarthritis prediction: insights from multi-modal predictive modeling with individual participant data of the world coach consortium

Acetabular dysplasia and the risk of developing hip osteoarthritis within 4-8 years: An individual participant data meta-analysis of 18,807 hips from the World COACH consortium

Authors:

M.A. van den Berg, F. Boel, M.M.A. van Buuren, N.S. Riedstra, J. Tang, H. Ahedi, N. Arden, S.M.A.Bierma-Zeinstra, C.G. Boer, F.M. Cicuttini, T.F. Cootes, K.M. Crossley, D.T. Felson, W.P. Gielis, J.J. Heerey, G. Jones, S. Kluzek, N.E. Lane, C. Lindner, J.A. Lynch, J.B.J. van Meurs, A. Mosler, A.E. Nelson, M.C. Nevitt, E.H. Oei, J. Runhaar, H. Weinans, J.H. Krijthe, R. Agricola

Abstract

Introduction

Radiographic hip osteoarthritis (RHOA) is a multifactorial disease, making early detection of individuals at risk challenging yet essential for timely intervention and evaluation of preventive strategies. Integrating information on multiple different data modalities using individual participant data from diverse cohorts may enhance predictive modeling in the early stages of RHOA. A focus on model interpretability may further enable the identification of clinically relevant patient subgroups and potential intervention targets.

Objective

Creating a multi-modal prediction model for improving the performance of RHOA incidence prediction models compared to clinical features alone and investigating the estimated predictor effects and the generalizability of the models to similar populations.

Methods

We pooled individual participant data from nine prospective cohort studies within the Worldwide Collaboration on OsteoArthritis prediCtion for the Hip (World COACH consortium). All studies included standardized anteroposterior pelvic, long-limb, and/or hip radiographs, assessed for RHOA at baseline and after 4–8years of follow-up. Incident RHOA was defined as the development of RHOA (grade≥2) in hips without definite RHOA at baseline (grade <2). The original cohort values of clinical predictors including were harmonized into one consistent dataset. X-ray-derived predictors describing the hip morphology, the alpha angle and the lateral center edge angle, were automatically and uniformly determined using automated landmark points placed with Bonefinder®. Additionally, the values of 13 shape modes explaining 85% of the variation from a landmark-based statistical shape model were included. This SSM was built on all baseline RHOA grade <2 hips within World COACH. Risk prediction models were built with generalized linear mixed effects models (GLMM) and Random Forest (RF) models while adjusting for correlations within cohorts and individuals. The discriminative performance (AUC) of different model configurations and the linear versus non-linear approaches were compared through stratified 5-fold cross-validation. For each model configuration, predictions were made with and without cohort labels to assess heterogeneity between cohorts.

Results

In total, 29,110 hips without definite RHOA at baseline were included of which 5.0% developed RHOA within 4-8 years (mean age 63.7 (8.6) years, 75.5% female, mean BMI 27.5 (4.7) kg/m2). When comparing our uni-modal prediction model using only the clinical predictors (Model 1) to those with X-ray information added (Table 1), we observed a higher discriminative performance for the multi-modal models. Overall, including cohort information significantly improved model performance (p < 0.05), and the RF models have a slightly but not significantly better performance than the GLMMs. Comparing the average effects of the significant predictors of the models including all predictors on incident RHOA (Figure 1), showed most differences between the GLMM and RF estimated effects at the maximum and minimum predictor values.

Conclusion

By leveraging multi-modal data, we could improve our predictions of incident RHOA compared to clinical features alone. Our findings indicate that there would be a benefit for considering non-linear modeling approaches for this task in future work.

Published: https://doi.org/10.1016/j.ostima.2025.100343

Figure 1 – Partial dependence plots for significantpredictors in model identified by the generalized linear mixed model (GLMM).Dashed orange lines indicate predictions from the GLMM, solid blue linesrepresent those from the random forest (RF). Each point on the x-axis shows theaverage predicted incident RHOA probability across all included hips, with theother 21 predictors held constant. If applicable, adjusted odds ratios (aORs)and their 95% confidence interval estimated by the GLMM are shown in the legendand correspond to the same units as the x-axis in the subsequent plot.