Research
Estimating Correlations Between Clinical Trial Outcomes Using Generalized Estimating Equations
Dai, Yuehao, Andrew W. Lo, Manish Singh, Qingyang Xu, and Ruixun Zhang (2025), Estimating Correlations Between Clinical Trial Outcomes Using Generalised Estimating Equations, Oxford Bulletin of Economics and Statistics, Early View. https://doi.org/10.1111/obes.70025.
View abstract
Hide abstract
Accurately estimating the correlations among clinical trial outcomes is crucial for managing the risk of biopharmaceutical investment portfolios. We propose a novel algorithm for estimating correlations in large clinical trial datasets using a generalised estimating equations (GEE) framework. Our algorithm outperforms existing methods in both convergence speed and computational efficiency. Empirical analysis of over 25,000 clinical trials reveals a correlation of approximately 10% for trials within therapeutic areas and up to 40% for trials sharing the same phase or mechanism hierarchy. Trials targeting the same disease show slightly negative correlations, suggesting a first-mover advantage. Our approach offers a scalable method to estimate correlations within large clinical trial datasets.
Use of Bayesian decision analysis in the design of patient-centered clinical trials for kidney failure devices
Ben Chaouch, Zied, Qingyang Xu, Shomesh E. Chaudhuri, David J. Gebben, Raymond C. Harris, Frank P. Hurst, Jennifer E. Flythe, Carol Mansfield, Anindita Saha, Murray Sheldon, Kien Wei Siah, Michelle Tarver, Katherine Treiman, Melissa West, Dallas Wood, and Andrew W. Lo (2025), Use of Bayesian Decision Analysis in the Design of Patient-Centered Clinical Trials for Kidney Failure Devices, Computers in Biology and Medicine 198, 111150. https://doi.org/10.1016/j.compbiomed.2025.111150.
View abstract
Hide abstract
Integrating patient preferences into the design of randomized clinical trials (RCTs) may help accelerate innovation for alternative kidney replacement therapy by appropriately selecting a trial's significance level and sample size, and have a meaningful impact on people suffering from kidney failure. While a conventional one-sided significance level threshold of 2.5 % is often used to assess the safety of a proposed device, we show in this study that it is not necessarily consistent with the risk-preferences of patients with dialysis-dependent kidney disease. We apply a Bayesian decision analysis (BDA) framework to results from a patient preference survey and estimate the optimal significance level and sample size required in an RCT to assess the safety of a hypothetical dialysis device. Based on survey responses from 599 patients with dialysis-dependent kidney failure, we found that the optimal significance level threshold differs significantly from the classical 2.5 % threshold used in two-sided hypothesis tests across various patient subgroups. On average, patients tended to require a significance level of 1.2 % for the risk of bleeding and a significance level <0.1 % for the risk of serious infection, suggesting that the survey respondents were not willing to bear either type of additional risk presented by the hypothetical device in exchange for the possible benefits described in the survey. However, there was heterogeneity among the patient subgroups of dialysis modality, age, gender, ethnicity, and time on dialysis. Overall, our study shows that the BDA framework is a robust, systematic, transparent, and reproducible method for incorporating patient preference information into the design and regulatory review process of clinical trials for novel therapeutics.
Liu, Fengze, Haoyu Wang, Joonhyuk Cho, Dan Roth, and Andrew W. Lo (2025), AutoCT: Automating Interpretable Clinical Trial Prediction with LLM Agents, EMNLP 2025, Suzhou China, November 2025.
View abstract
Hide abstract
Clinical trials are critical for advancing medical treatments but remain prohibitively expensive and time-consuming. Accurate prediction of clinical trial outcomes can significantly reduce research and development costs and accelerate drug discovery. While recent deep learning models have shown promise by leveraging unstructured data, their black-box nature, lack of interpretability, and vulnerability to label leakage limit their practical use in high-stakes biomedical contexts. In this work, we propose AutoCT, a novel framework that combines the reasoning capabilities of large language models with the explainability of classical machine learning. AutoCT autonomously generates, evaluates, and refines tabular features based on public information without human input. Our method uses Monte Carlo Tree Search to iteratively optimize predictive performance. Experimental results show that AutoCT performs on par with or better than SOTA methods on clinical trial prediction tasks within only a limited number of self-refinement iterations, establishing a new paradigm for scalable, interpretable, and cost-efficient clinical trial prediction.
A father’s crusade in rare disease drug development: a case study of Elpida therapeutics and Melpida
Portero, Deanna, Qingyang Xu, Aaliya Hussain, and Andrew W. Lo (2025), A Father’s Crusade in Rare Disease Drug Development: A Case Study of Elpida Therapeutics and Melpida, Orphanet Journal of Rare Diseases 20, https://doi.org/10.1186/s13023-025-03892-0.
View abstract
Hide abstract
Therapeutic development for rare diseases is difficult for pharmaceutical companies due to significant scientific challenges, extensive costs, and low financial returns. It is increasingly common for caregivers and patient advocacy groups to partner with biomedical professionals to finance and develop treatments for rare diseases. This case study illustrates the story of Terry Pirovolakis, a father who partnered with biomedical professionals to develop the novel gene therapy, Melpida, within 36 months of the diagnosis of his infant son. We identify the factors that led to the success of Melpida and analyze the business model of Elpida Therapeutics, a social purpose corporation founded by Pirovolakis to reproduce the success of Melpida for other rare diseases. We conclude with four lessons from Melpida to inform caregivers like Pirovolakis on developing novel gene therapies to save their loved ones.
Cho, Joonhyuk, Eugene Sorets, Shomesh Chaudhuri, Annette De Mattos, Kristin Drake, Merit E. Cudkowicz, Ricardo Ortiz, Meredith Hasenoehrl, Marianne Chase, Brittney Harkey, Sabrina Paganoni, and John Frishkopf (2025), Financing Drug Development via Adaptive Platform Trials, PLoS ONE 20 (7), e0325826, https://doi.org/10.1371/journal.pone.0325826.
View abstract
Hide abstract
We propose a new approach to funding disease-specific drug development via a variation of the adaptive platform trial. This trial is designed to test a portfolio of drug candidates in parallel, with the cost of the trial partially covered by investors who receive payments from a royalty fund of the candidates in exchange for investment. Under realistic assumptions for cost, revenue, probability of success, drug sales, and royalty rates, investors may expect a return of 28%, but with a 22% probability of total loss. Such return distributions may be attractive to hedge funds, family offices, and philanthropic investors seeking both social impact and financial return. Return distributions palatable to mainstream investors may be achieved by funding multiple platform trials simultaneously and securitizing the aggregate cash flows.
Cho, Joonhyuk, Qingyang Xu, Chi Heem Wong, and Andrew W. Lo (2025), Predicting clinical trial duration via statistical and machine learning models 45, 101473. https://doi.org/10.1016/j.conctc.2025.101473.
View abstract
Hide abstract
We apply survival analysis as well as machine learning models to predict the duration of clinical trials using the largest dataset so far constructed in this domain. Neural network-based DeepSurv yields the most accurate predictions and we identify key factors that are most predictive of trial duration. This methodology may help clinical researchers optimize trial designs for expedited testing, and can also reduce the financial risk of drug development, which in turn will lower the cost of funding and increase the amount of capital allocated to this sector.
Lo, Andrew W., Ruixun Zhang, and Chaoyi Zhao (2025), The Evolution of Discrimination Under Finite Memory Constraints, Scientific Reports 15, 31774. https://doi.org/10.1038/s41598-025-17089-9.
View abstract
Hide abstract
We develop an evolutionary model for individual discriminatory behavior that emerges naturally in a mixed population as an adaptive strategy. Our findings show that, when individuals have finite memory and face uncertain environments, they may rely on prior biases and observable group traits to make decisions, changing their discriminatory practices. We also demonstrate that a finite memory is a consequence of natural selection because it leads to higher fitness in dynamic environments with mutations. This adaptability allows individuals with finite memory to better respond to environmental variability, offering a potential evolutionary advantage. Our study suggests that memory constraints and environmental changes are critical factors in sustaining biased behavior, suggesting insights into the persistence of discrimination in real-world settings and possible mitigation strategies across fields, including education, policymaking, and artificial intelligence.
Kumar, Neil, Andrew W. Lo, Chinmay Shukla, and Brian Stephenson (2024), Applications of Portfolio Theory to Accelerating Biomedical Innovation, Journal of Portfolio Management 51 (1), 213-236.
View abstract
Hide abstract
Biomedicine is experiencing an inflection point in which the origins of many human diseases have been decoded, leading to new treatments and, in some cases, complete cures. Many domain experts acknowledge that the gating factor to innovation is not knowledge, but rather a lack of financial resources to translate theory into practice, the so-called “valley of death” between scientific discovery and the clinical testing that must be done with human subjects before regulators will approve a new drug or medical device. This process of translational medicine is largely an exercise in risk management—organized as a carefully planned sequence of experiments, each one involving a progressively larger number of subjects that may or may not be allowed to continue, depending on the results of the prior experiment. It is, therefore, a natural setting in which to apply modern portfolio theory. The authors describe one such application involving a biotechnology company focused on genetic diseases and the lessons learned from that experience.
Innovative Insurance to Improve US Patient Access to Cell and Gene Therapy
Conti, Rena M., Patrick DeMartino, Jonathan Gruber, Andrew W. Lo, Yutong Sun, and Jackie Wu (2025), Innovative Insurance to Improve US Patient Access to Cell and Gene Therapy, The Milbank Quarterly 103 (1), 32-51, https://doi.org/10.1111/1468-0009.12728.
View abstract
Hide abstract
CONTEXT: Cell and gene therapies (CGTs) offer treatment to rare and oftentimes deadly diseases. Because of their high price and uncertain clinical outcomes, US insurers commonly restrain patient access to CGTs, and these barriers may create or perpetuate existing disparities. A reconsideration of existing insurance policies to improve access and reduce disparities is currently underway. One method insurers use to support access and protect them from large, unexpected claims is the purchase of reinsurance. In exchange for an upfront per-member-per-month (PMPM) premium, the reinsurer pays the claim and rebates the insurer at the end of the contract period if there are funds leftover. However, existing reinsurance plans may not cover CGTs or charge exorbitant fees for coverage.
METHODS: We simulate the incremental annual per-person reinsurer costs to cover CGTs existing or expected between 2023 and 2035 for the US population and by payer type based on previously published estimates of expected US spending on CGTs, assumed US population of 330 persons, and current CGT reinsurance fees. We illustrate our methods by estimating the incremental annual per-person costs overall payers and to state Medicaid plans of sickle cell disease–targeted CGTs.
FINDINGS: We estimate annual incremental spending on CGTs 2023–2035 to amount to $20.4 billion, or $15.69 per person. Total annual estimated spending is expected to concentrate among commercial plans. Sickle cell–targeted CGTs add a maximum of $0.78 PMPM in costs to all payers and will concentrate within state Medicaid programs. Reinsurance fees add to expected costs.
CONCLUSIONS: Annual per-person costs to provide access to CGTs are expected to concentrate in commercial and state Medicaid plans. Policies that improve CGT coverage and affordability are needed.
LLM economicus? Mapping the Behavioral Biases of LLMs via Utility Theory
Ross, Jillian, Yoon Kim, and Andrew W. Lo (2024), LLM economicus? Mapping the Behavioral Biases of LLMs via Utility Theory, COLM 2024, Philadelphia, Pennsylvania, October 2024. https://doi.org/10.48550/arXiv.2408.02784.
View abstract
Hide abstract
Humans are not homo economicus (i.e., rational economic beings). As humans, we exhibit systematic behavioral biases such as loss aversion, anchoring, framing, etc., which lead us to make suboptimal economic decisions. Insofar as such biases may be embedded in text data on which large language models (LLMs) are trained, to what extent are LLMs prone to the same behavioral biases? Understanding these biases in LLMs is crucial for deploying LLMs to support human decision-making. We propose utility theory-a paradigm at the core of modern economic theory-as an approach to evaluate the economic biases of LLMs. Utility theory enables the quantification and comparison of economic behavior against benchmarks such as perfect rationality or human behavior. To demonstrate our approach, we quantify and compare the economic behavior of a variety of open- and closed-source LLMs. We find that the economic behavior of current LLMs is neither entirely human-like nor entirely economicus-like. We also find that most current LLMs struggle to maintain consistent economic behavior across settings. Finally, we illustrate how our approach can measure the effect of interventions such as prompting on economic biases.