Introduction: Since the late 1990s, research and administrative institutions have been developing health data warehouses and increasingly reusing claims data. The impact of these changes is not yet completely quantified. Our objective was to compare the change in the number of patients included per study between observational and interventional studies over a 20-year period starting in 1995.
Materials and methods: We extracted all abstracts from studies published in three leading medical journals over the period 1995–2014 (18,107 studies). Then, we divided our study into two steps. First, we constructed an SVM-based predictive model to categorize each abstract into “observational”, “interventional” or “other” studies. In a second step, we built an algorithm based on regular expressions to automatically extract the number of included patients.
Results: During the investigated period, the median number of enrolled patients per study increased for interventional studies, from 282 in 1995–1999 to 629 in 2010–2014. In the same time, the median number of patients increased more for observational studies, from 368 in 1995–1999 to 2078 in 2010–2014.
Discussion: The routine storage of an increasing amount of data (from data warehouses or claims data) has had an impact in recent years on the number of patients included in observational studies. The recent development of “randomized registry trials” combining, on the one hand, an intervention and, on the other hand, the identification of the outcome through data reuse, may also have an impact, over the next decade, on the number of patients included in randomized clinical trials.