Publications
LLM economicus? Mapping the Behavioral Biases of LLMs via Utility Theory (Working Paper)
2024Humans are not homo economicus (i.e., rational economic beings). As humans, we exhibit systematic behavioral biases such as loss aversion, anchoring, framing, etc., which lead us to make suboptimal economic decisions. Insofar as such biases may be embedded in text data on which large language models (LLMs) are trained, to what extent are LLMs prone to the same behavioral biases? Understanding these biases in LLMs is crucial for deploying LLMs to support human decision-making. We propose utility theory-a paradigm at the core of modern economic theory-as an approach to evaluate the economic biases of LLMs. Utility theory enables the quantification and comparison of economic behavior against benchmarks such as perfect rationality or human behavior. To demonstrate our approach, we quantify and compare the economic behavior of a variety of open- and closed-source LLMs. We find that the economic behavior of current LLMs is neither entirely human-like nor entirely economicus-like. We also find that most current LLMs struggle to maintain consistent economic behavior across settings. Finally, we illustrate how our approach can measure the effect of interventions such as prompting on economic biases.
Can ChatGPT Plan Your Retirement?: Generative AI and Financial Advice
2024We identify some of the most pressing issues facing the adoption of large language models (LLMs) in practical settings, and propose a research agenda to reach the next technological inflection point in generative AI. We focus on three challenges facing most LLM applications: domain-specific expertise and the ability to tailor that expertise to a user’s unique situation, trustworthiness and adherence to the user’s moral and ethical standards, and conformity to regulatory guidelines and oversight. These challenges apply to virtually all industries and endeavors in which LLMs can be applied, such as medicine, law, accounting, education, psychotherapy, marketing, and corporate strategy. For concreteness, we focus on the narrow context of financial advice, which serves as an ideal test bed both for determining the possible shortcomings of current LLMs and for exploring ways to overcome them. Our goal is not to provide solutions to these challenges—which will likely take years to develop—but to propose a framework and road map for solving them as part of a larger research agenda for improving generative AI in any application.
Generative AI from Theory to Practice: A Case Study of Financial Advice
2024We identify some of the most pressing issues facing the adoption of large language models (LLMs) in practical settings and propose a research agenda to reach the next technological inflection point in generative AI. We focus on three challenges facing most LLM applications: domain-specific expertise and the ability to tailor that expertise to a user’s unique situation, trustworthiness and adherence to the user’s moral and ethical standards, and conformity to regulatory guidelines and oversight. These challenges apply to virtually all industries and endeavors in which LLMs can be applied, such as medicine, law, accounting, education, psychotherapy, marketing, and corporate strategy. For concreteness, we focus on the narrow context of financial advice, which serves as an ideal test bed both for determining the possible shortcomings of current LLMs and for exploring ways to overcome them. Our goal is not to provide solutions to these challenges—which will likely take years to develop—but rather to propose a framework and road map for solving them as part of a larger research agenda for improving generative AI in any application.
From ELIZA to ChatGPT: The Evolution of NLP and Financial Applications
2023Natural language processing (NLP) has revolutionized the financial industry, providing advanced techniques for the processing, analyzing, and understanding of unstructured financial text. The authors provide a comprehensive overview of the historical development of NLP, starting from early rules-based approaches to recent advances in deep learning–based NLP models. They also discuss applications of NLP in finance along with its challenges, including data scarcity and adversarial examples, and speculate about the future of NLP in the financial industry. To illustrate the capability of current NLP models, a state-of-the-art chatbot is employed as a co-author of this article.
Deep-Learning Models for Forecasting Financial Risk Premia and Their Interpretations
2023The measurement of financial risk premia, the amount that a risky asset will outperform a risk-free one, is an important problem in asset pricing. The noisiness and non-stationarity of asset returns makes the estimation of risk premia using machine learning (ML) techniques challenging. In this work, we develop ML models that solve the problems associated with risk premia forecasting by separating risk premia prediction into two independent tasks, a time series model and a cross-sectional model, and using neural networks with skip connections to enable their deep neural network training.These models are tested robustly with different metrics, and we observe that our models outperform several existing standard ML models. A known issue with ML models is their ‘black box’ nature, i.e. their opaqueness to interpretability. We interpret these deep neural networks using local approximation-based techniques that provide explanations for our model’s predictions.
Explainable Machine Learning Models of Consumer Credit Risk
2023In this work, the authors create machine learning (ML) models to forecast home equity credit risk for individuals using a real-world dataset and demonstrate methods to explain the output of these ML models to make them more accessible to the end user. They analyze the explainability for various stakeholders: loan companies, regulators, loan applicants, and data scientists, incorporating their different requirements with respect to explanations. For loan companies, they generate explanations for every model prediction of creditworthiness. For regulators, they perform a stress test for extreme scenarios. For loan applicants, they generate diverse counterfactuals to guide them with steps toward a favorable classification from the model. Finally, for data scientists, they generate simple rules that accurately explain 70%–72% of the dataset. Their study provides a synthesized ML explanation framework for all stakeholders and is intended to accelerate the adoption of ML techniques in domains that would benefit from explanations of their predictions.
Estimation and Prediction for Algorithmic Models of Investor Behavior
2022We propose a Markov chain Monte Carlo (MCMC) algorithm for estimating the parameters of algorithmic models of investor behavior. We show that this method can successfully infer the relative importance of each heuristic among a large cross-section of investors, even when the number of observations per investor is quite small. We also compare the accuracy of the MCMC approach to regression analysis in predicting the relative importance of heuristics at the individual and aggregate levels and conclude that MCMC predicts aggregate weights more accurately while regression outperforms in predicting individual weights.
When Do Investors Freak Out? Machine Learning Predictions of Panic Selling
2022Using a novel dataset of 653,455 individual brokerage accounts belonging to 298,556 households, we document the frequency, timing, and duration of panic sales, which we define as a decline of 90% of a household account’s equity assets over the course of one month, of which 50% or more is due to trades. We find that a disproportionate number of households make panic sales when there are sharp market downturns, a phenomenon we call ‘freaking out.’ We show that panic selling and freak-outs are predictable and fundamentally different from other well-known behavioral patterns such as overtrading or the disposition effect.
An Artificial Intelligence-Based Industry Peer Grouping System
2022In this article, the authors develop a data-driven peer grouping system using artificial intelligence (AI) tools to capture market perception and, in turn, group companies into clusters at various levels of granularity. In addition, they develop a continuous measure of similarity between companies; they use this measure to group companies into clusters and construct hedged portfolios. In the peer groupings, companies grouped in the same clusters had strong homogeneous risk and return profiles, whereas different clusters of companies had diverse, varying risk exposures. The authors extensively evaluated the clusters and found that companies grouped by their method had higher out-of-sample return correlation but lower stability and interpretability than companies grouped by a standard industry classification system. The authors also develop an interactive visualization system for identifying AI-based clusters and similar companies.
Algorithmic Models of Investor Behavior
2021We propose a heuristic approach to modeling investor behavior by simulating combinations of simpler systematic investment strategies associated with well-known behavioral biases—in functional forms motivated by an extensive review of the behavioral finance literature—using parameters calibrated from historical data. We compute the investment performance of these heuristics individually and in pairwise combinations using both simulated and historical asset-class returns. The mean-reversion or momentum nature of a heuristic can often explain its effect on performance, depending on whether asset returns are consistent with such dynamics. These algorithms show that seemingly irrational investor behavior may, in fact, have been shaped by evolutionary forces and can be effective in certain environments and maladaptive in others.
SCRAM: A Platform for Securely Measuring Cyber Risk
2020We develop a new cryptographic platform called SCRAM (Secure Cyber Risk Aggregation and Measurement) that allows multiple entities to compute aggregate cyber-risk measures without requiring any entity to disclose its own sensitive data on cyberattacks, penetrations, and losses. Using the SCRAM platform, we present results from two computations in a pilot study with six large private-sector companies: (1) benchmarks of the adoption rates of 171 critical security measures and (2) links between monetary losses from 49 security incidents and the specific sub-control failures implicated in each incident. These results provide insight into problematic cyber-risk-control areas that need additional scrutiny and/or investment, but in a completely anonymized and privacy-preserving way.
Why Artificial Intelligence May Not Be As Useful or As Challenging As Artificial Stupidity
2019A commentary on the article, "Artificial Intelligence—The Revolution Hasn’t Happened Yet" by Michael I. Jordan, published by Harvard Data Science Review (July 2019).
Estimation of Clinical Trial Success Rates and Related Parameters
2019Previous estimates of drug development success rates rely on relatively small samples from databases curated by the pharmaceutical industry and are subject to potential selection biases. Using a sample of 406,038 entries of clinical trial data for over 21,143 compounds from January 1, 2000 to October 31, 2015, we estimate aggregate clinical trial success rates and durations. We also compute disaggregated estimates across several trial features including disease type, clinical phase, industry or academic sponsor, biomarker presence, lead indication status, and time. In several cases, our results differ significantly in detail from widely cited statistics. For example, oncology has a 3.4% success rate in our sample vs. 5.1% in prior studies. However, after declining to 1.7% in 2012, this rate has improved to 2.5% and 8.3% in 2014 and 2015, respectively. In addition, trials that use biomarkers in patient-selection have higher overall success probabilities than trials without biomarkers.
If Liberal Democracies Can Resist the Urge to Micromanage the Economy, Big Data Could Catalyze a New Capitalism
2018Capitalism is a powerful tool: By compressing enormous amounts of information regarding supply and demand into a single number—the market price—buyers and sellers are able to make remarkably intelligent decisions simply by engaging in self-interested behavior. But in a big-data world, where a supercomputer can fit into our pocket and a simple Internet search allows us to find every product under the Sun, do we still need it?
In Reinventing Capitalism in the Age of Big Data, Viktor Mayer-Schönberger and Thomas Ramge argue that big data will transform our economies on a fundamental level. Money will become obsolete, they argue, replaced by metadata. Instead of a single market price for each commodity, sophisticated matching algorithms will use a bundle of specifications and personal preferences to select just the right product for you. Artificial intelligence powered by machine-learning techniques will relentlessly negotiate the best possible transaction on your behalf. Capital will still be important, they concede, but increasingly just for its signaling content. “Venture informers” might even replace venture capitalists.
Why Robo-Advisors Need Artificial Stupidity
2018‘Fintech’ is transforming the financial sector at a pace that is now obvious even to the casual observer. We see this not only in daily headlines about initial coin offerings or financial applications of blockchain technology, but also in the daily experiences of the average consumer: paper cheques consigned forever to desk drawers, automatic currency conversions on a trip abroad, the rapid approval of an online loan – and most excitingly for some, personal investing.