Tags:Federated learning, Missing data and Multiple sources
Abstract:
Establishing causal dependencies is crucial in applied domains, such as medicine and healthcare, where decision-making must be explainable. In these settings, small sample sizes and missing data call for federated approaches to maximise the amount of information we can use. We propose a novel federated causal discovery algorithm capable of pooling information from multiple sources with heterogeneous missing data to learn a graph representing cause-effect relationships. In particular, we learn a causal graph on a centralised server while taking into account both prior knowledge and missingness mechanism specific to each client. We applied the proposed algorithm to a real-world, multicentric study on endometrial cancer and validated the resulting causal graph through quantitative analyses and a clinical literature review. Our approach learns an accurate model despite the presence of data missing not-at-random.
Federated Causal Discovery with Missing Data in a Multicentric Study on Endometrial Cancer