Explainable Machine Learning Models of Consumer Credit Risk (Working Paper)2022
In this paper, we create machine learning (ML) models to forecast home equity credit risk for individuals using a real-world dataset and demonstrate methods to explain the output of these ML models to make them more accessible to the end-user. We analyze the explainability of these models for various stakeholders: loan companies, regulators, loan applicants, and data scientists, incorporating their different requirements with respect to explanations. For loan companies, we generate explanations for every model prediction of creditworthiness. For regulators, we perform a stress test for extreme scenarios. For loan applicants, we generate diverse counterfactuals to guide them with steps to reverse the model's classification. Finally, for data scientists, we generate simple rules that accurately explain 70-72% of the dataset. Our work is intended to accelerate the adoption of ML techniques in domains that would benefit from explanations of their predictions.
Algorithmic Models of Investor Behavior2021
We propose a heuristic approach to modeling investor behavior by simulating combinations of simpler systematic investment strategies associated with well-known behavioral biases—in functional forms motivated by an extensive review of the behavioral finance literature—using parameters calibrated from historical data. We compute the investment performance of these heuristics individually and in pairwise combinations using both simulated and historical asset-class returns. The mean-reversion or momentum nature of a heuristic can often explain its effect on performance, depending on whether asset returns are consistent with such dynamics. These algorithms show that seemingly irrational investor behavior may, in fact, have been shaped by evolutionary forces and can be effective in certain environments and maladaptive in others.
SCRAM: A Platform for Securely Measuring Cyber Risk2020
We develop a new cryptographic platform called SCRAM (Secure Cyber Risk Aggregation and Measurement) that allows multiple entities to compute aggregate cyber-risk measures without requiring any entity to disclose its own sensitive data on cyberattacks, penetrations, and losses. Using the SCRAM platform, we present results from two computations in a pilot study with six large private-sector companies: (1) benchmarks of the adoption rates of 171 critical security measures and (2) links between monetary losses from 49 security incidents and the specific sub-control failures implicated in each incident. These results provide insight into problematic cyber-risk-control areas that need additional scrutiny and/or investment, but in a completely anonymized and privacy-preserving way.
Why Artificial Intelligence May Not Be As Useful or As Challenging As Artificial Stupidity2019
A commentary on the article, "Artificial Intelligence—The Revolution Hasn’t Happened Yet" by Michael I. Jordan, published by Harvard Data Science Review (July 2019).
Estimation of Clinical Trial Success Rates and Related Parameters2019
Previous estimates of drug development success rates rely on relatively small samples from databases curated by the pharmaceutical industry and are subject to potential selection biases. Using a sample of 406,038 entries of clinical trial data for over 21,143 compounds from January 1, 2000 to October 31, 2015, we estimate aggregate clinical trial success rates and durations. We also compute disaggregated estimates across several trial features including disease type, clinical phase, industry or academic sponsor, biomarker presence, lead indication status, and time. In several cases, our results differ significantly in detail from widely cited statistics. For example, oncology has a 3.4% success rate in our sample vs. 5.1% in prior studies. However, after declining to 1.7% in 2012, this rate has improved to 2.5% and 8.3% in 2014 and 2015, respectively. In addition, trials that use biomarkers in patient-selection have higher overall success probabilities than trials without biomarkers.
If Liberal Democracies Can Resist the Urge to Micromanage the Economy, Big Data Could Catalyze a New Capitalism2018
Capitalism is a powerful tool: By compressing enormous amounts of information regarding supply and demand into a single number—the market price—buyers and sellers are able to make remarkably intelligent decisions simply by engaging in self-interested behavior. But in a big-data world, where a supercomputer can fit into our pocket and a simple Internet search allows us to find every product under the Sun, do we still need it?
In Reinventing Capitalism in the Age of Big Data, Viktor Mayer-Schönberger and Thomas Ramge argue that big data will transform our economies on a fundamental level. Money will become obsolete, they argue, replaced by metadata. Instead of a single market price for each commodity, sophisticated matching algorithms will use a bundle of specifications and personal preferences to select just the right product for you. Artificial intelligence powered by machine-learning techniques will relentlessly negotiate the best possible transaction on your behalf. Capital will still be important, they concede, but increasingly just for its signaling content. “Venture informers” might even replace venture capitalists.
Why Robo-Advisors Need Artificial Stupidity2018
‘Fintech’ is transforming the financial sector at a pace that is now obvious even to the casual observer. We see this not only in daily headlines about initial coin offerings or financial applications of blockchain technology, but also in the daily experiences of the average consumer: paper cheques consigned forever to desk drawers, automatic currency conversions on a trip abroad, the rapid approval of an online loan – and most excitingly for some, personal investing.
Cryptocurrencies: King’s Ransom or Fool’s Gold?2018
The increasing dominance of technology in daily lives is finally penetrating the financial industry as well. The growing popularity of algorithmic trading, mobile payment platforms and robo-advisers is just the beginning of the fintech revolution. But perhaps the most radical - and controversial - innovation in today's headlines is cryptocurrencies. Extreme volatility makes products an unreliable store of value - for now.
Momentum, Mean-Reversion, and Social Media: Evidence from StockTwits and Twitter2018
In this article, the authors analyze the relation between stock market liquidity and real-time measures of sentiment obtained from the social-media platforms StockTwits and Twitter. The authors find that extreme sentiment corresponds to higher demand for and lower supply of liquidity, with negative sentiment having a much larger effect on demand and supply than positive sentiment. Their intraday event study shows that booms and panics end when bullish and bearish sentiment reach extreme levels, respectively. After extreme sentiment, prices become more mean-reverting and spreads narrow. To quantify the magnitudes of these effects, the authors conduct a historical simulation of a market-neutral mean-reversion strategy that uses social-media information to determine its portfolio allocations. These results suggest that the demand for and supply of liquidity are influenced by investor sentiment and that market makers who can keep their transaction costs to a minimum are able to profit by using extreme bullish and bearish emotions in social media as a real-time barometer for the end of momentum and a return to mean reversion.
Moore’s Law vs. Murphy’s Law in the Financial System: Who’s Winning?2017
Breakthroughs in computing hardware, software, telecommunications, and data analytics have transformed the financial industry, enabling a host of new products and services such as automated trading algorithms, crypto-currencies, mobile banking, crowdfunding, and robo-advisors. However, the unintended consequences of technology-leveraged finance include firesales, flash crashes, botched initial public offerings, cybersecurity breaches, catastrophic algorithmic trading errors, and a technological arms race that has created new winners, losers, and systemic risk in the financial ecosystem. These challenges are an unavoidable aspect of the growing importance of finance in an increasingly digital society. Rather than fighting this trend or forswearing technology, the ultimate solution is to develop more robust technology capable of adapting to the foibles in human behavior so users can employ these tools safely, effectively, and effortlessly. Examples of such technology are provided.
The Wisdom of Twitter Crowds: Predicting Stock Market Reactions to FOMC Meetings via Twitter Feeds2016
With the rise of social media, investors have a new tool for measuring sentiment in real time. However, the nature of these data sources raises serious questions about its quality. Because anyone on social media can participate in a conversation about markets—whether the individual is informed or not—these data may have very little information about future asset prices. In this article, the authors show that this is not the case. They analyze a recurring event that has a high impact on asset prices—Federal Open Market Committee (FOMC) meetings—and exploit a new dataset of tweets referencing the Federal Reserve. The authors show that the content of tweets can be used to predict future returns, even after controlling for common asset pricing factors. To gauge the economic magnitude of these predictions, the authors construct a simple hypothetical trading strategy based on this data. They find that a tweet-based asset allocation strategy outperforms several benchmarks—including a strategy that buys and holds a market index, as well as a comparable dynamic asset allocation strategy that does not use Twitter information.
Q Group Panel Discussion: Looking to the Future2016
Moderator Martin Leibowitz asked a panel of industry experts—Andrew W. Lo, Robert C. Merton, Stephen A. Ross, and Jeremy Siegel—what they saw as the most important issues in finance, especially as those issues relate to practitioners. Drawing on their vast knowledge, these panelists addressed topics such as regulation, technology, and financing society’s challenges; opacity and trust; the social value of finance; and future expected returns.
Imagine if Robo Advisers Could Do Emotions2016
WSJ Wealth Expert Andrew W. Lo of MIT says robo advisers are the rotary phones to today’s iPhone--technology that has great potential but it still immature.
Law Is Code: A Software Engineering Approach to Analyzing the United States Code2015
The agglomeration of rules and regulations over time has produced a body of legal code that no single individual can fully comprehend. This complexity produces inefficiencies, makes the processes of understanding and changing the law difficult,and frustrates the fundamental principle that the law should provide fair notice to the governed. In this Article, we take a quantitative, unbiased, and software-engineering approach to analyze the evolution of the United States Code from 1926 to today. Software engineers frequently face the challenge of understanding and managing large, structured collections of instructions, directives, and conditional statements, and we adapt and apply their techniques to the U.S. Code over time. Our work produces insights into the structure of the U.S. Code as a whole, its strengths and vulnerabilities, and new ways of thinking about individual laws. For example, we identify the first appearance and spread of important terms in the U.S. Code like “whistleblower” and “privacy.” We also analyze and visualize the network structure of certain substantial reforms,including the Patient Protection and Affordable Care Act and the Dodd-Frank Wall Street Reform and Consumer Protection Act, and show how the interconnections of references can increase complexity and create the potential for unintended consequences. Our work is a timely illustration of computational approaches to law as the legal profession embraces technology for scholarship in order to increase efficiency and to improve access to justice.
Learning Connections in Financial Time Series2013
To reduce risk, investors seek assets that have high expected return and are unlikely to move in tandem. Correlation measures are generally used to quantify the connections between equities. The 2008 financial crisis, and its aftermath, demonstrated the need for a better way to quantify these connections. We present a machine learning-based method to build a connectedness matrix to address the shortcomings of correlation in capturing events such as large losses. Our method uses an unconstrained optimization to learn this matrix, while ensuring that the resulting matrix is positive semi-de nite. We show that this matrix can be used to build portfolios that not only beat the market," but also outperform optimal (i.e., minimum variance) portfolios.