Publications
Law Is Code: A Software Engineering Approach to Analyzing the United States Code
2015The agglomeration of rules and regulations over time has produced a body of legal code that no single individual can fully comprehend. This complexity produces inefficiencies, makes the processes of understanding and changing the law difficult,and frustrates the fundamental principle that the law should provide fair notice to the governed. In this Article, we take a quantitative, unbiased, and software-engineering approach to analyze the evolution of the United States Code from 1926 to today. Software engineers frequently face the challenge of understanding and managing large, structured collections of instructions, directives, and conditional statements, and we adapt and apply their techniques to the U.S. Code over time. Our work produces insights into the structure of the U.S. Code as a whole, its strengths and vulnerabilities, and new ways of thinking about individual laws. For example, we identify the first appearance and spread of important terms in the U.S. Code like “whistleblower” and “privacy.” We also analyze and visualize the network structure of certain substantial reforms,including the Patient Protection and Affordable Care Act and the Dodd-Frank Wall Street Reform and Consumer Protection Act, and show how the interconnections of references can increase complexity and create the potential for unintended consequences. Our work is a timely illustration of computational approaches to law as the legal profession embraces technology for scholarship in order to increase efficiency and to improve access to justice.
Learning Connections in Financial Time Series
2013To reduce risk, investors seek assets that have high expected return and are unlikely to move in tandem. Correlation measures are generally used to quantify the connections between equities. The 2008 financial crisis, and its aftermath, demonstrated the need for a better way to quantify these connections. We present a machine learning-based method to build a connectedness matrix to address the shortcomings of correlation in capturing events such as large losses. Our method uses an unconstrained optimization to learn this matrix, while ensuring that the resulting matrix is positive semi-de nite. We show that this matrix can be used to build portfolios that not only beat the market," but also outperform optimal (i.e., minimum variance) portfolios.
Using Algorithmic Attribution Techniques To Determine Authorship In Unsigned Judicial Opinions
2013This article proposes a novel and provocative analysis of judicial opinions that are published without indicating individual authorship. Our approach provides an unbiased, quantitative, and computer scientific answer to a problem that has long plagued legal commentators. Our work uses natural language processing to predict authorship of judicial opinions that are unsigned or whose attribution is disputed. Using a dataset of Supreme Court opinions with known authorship, we identify key words and phrases that can, to a high degree of accuracy, predict authorship. Thus, our method makes accessible an important class of cases heretofore inaccessible. For illustrative purposes, we explain our process as applied to the Obamacare decision, in which the authorship of a joint dissent was subject to significant popular speculation. We conclude with a chart predicting the author of every unsigned per curiam opinion during the Roberts Court.
Moore’s Law versus Murphy’s Law: Algorithmic Trading and Its Discontents
2013Financial markets have undergone a remarkable transformation over the past two decades due to advances in technology. These advances include faster and cheaper computers, greater connectivity among market participants, and perhaps most important of all, more sophisticated trading algorithms. The benefits of such financial technology are evident: lower transactions costs, faster executions, and greater volume of trades. However, like any technology, trading technology has unintended consequences. In this paper, we review key innovations in trading technology starting with portfolio optimization in the 1950s and ending with high-frequency trading in the late 2000s, as well as opportunities, challenges, and economic incentives that accompanied these developments. We also discuss potential threats to financial stability created or facilitated by algorithmic trading and propose “Financial Regulation 2.0,” a set of design principles for bringing the current financial regulatory framework into the Digital Age.
Finance is in Need of a Technological Revolution
2012The financial system has reached a level of complexity that only “power users” – highly trained experts with domain-specific knowledge – are able to manage. But because technological advances have come so quickly and are often adopted so broadly, there are not enough power users to go around. The interconnectedness of financial markets and institutions has created a new form of financial accident: a systemic event that extends beyond the borders of any single organisation.
Privacy-Preserving Methods for Sharing Financial Risk Exposures
2012Unlike other industries in which intellectual property is patentable, the financial industry relies on trade secrecy to protect its business processes and methods, which can obscure critical financial risk exposures from regulators and the public. We develop methods for sharing and aggregating such risk exposures that protect the privacy of all parties involved and without the need for a trusted third party. Our approach employs secure multi-party computation techniques from cryptography in which multiple parties are able to compute joint functions without revealing their individual inputs. In our framework, individual financial institutions evaluate a protocol on their proprietary data which cannot be inverted, leading to secure computations of real-valued statistics such as concentration indexes, pairwise correlations, and other single- and multi-point statistics. The proposed protocols are computationally tractable on realistic sample sizes. Potential financial applications include: the construction of privacy-preserving real-time indexes of bank capital and leverage ratios; the monitoring of delegated portfolio investments; financial audits, and the publication of new indexes of proprietary trading strategies.
A Computational View of Market Efficiency
2011We propose to study market efficiency from a computational viewpoint. Borrowing from theoretical computer science, we define a market to be efficient with respect to resources S (e.g., time, memory) if no strategy using resources S can make a profit. As a first step, we consider memory-m strategies whose action at time t depends only on the m previous observations at times t - m,...,t - 1. We introduce and study a simple model of market evolution, where strategies impact the market by their decision to buy or sell. We show that the effect of optimal strategies using memory m can lead to "market conditions" that were not present initially, such as (1) market bubbles and (2) the possibility for a strategy using memory m' > m to make a bigger profit than was initially possible. We suggest ours as a framework to rationalize the technological arms race of quantitative trading firms.
Securities Trading of Concepts (STOC)
2011Identifying winning new product concepts can be a challenging process that requires insight into private consumer preferences. To measure consumer preferences for new product concepts, the authors apply a 'securities of trading of concepts,' or STOC, approach, in which new product concepts are traded as financial securities. The authors apply this method because market prices are known to efficiently collect and aggregate private information regarding the economic value of goods, sevices, and firms, particularly when trading financial securities. This research compares the STOC approach against stated-choice, conjoint, constant-sum, and longitudinal revealed-preference data. The authors also place STOC in the context of previous research on prediction markets and experimental economics. The authors conduct a series of experiments in multiple product categories to test whether STOC (1) is more cost efficient than other methods, (2) passes validity tests, (3) measures expectations of others, and (4) reveals individual preferences, not just those of the crowd. The results also show that traders exhibit bias on the basis of self-preferences when trading. Ultimately, STOC offers two key advantages over traditional market research methods: cost efficiency and scalability. For new product development teams deciding how to invest resources, this scalability may be especially important in the Web 2.0 world, in which customers are constantly interacting with firms and one another in suggesting numerous product design possibilities that need to be screened.
Consumer Credit Risk Models via Machine-Learning Algorithms
2010We apply machine-learning techniques to construct nonlinear nonparametric forecasting models of consumer credit risk. By combining customer transactions and credit bureau data from January 2005 to April 2009 for a sample of a major commercial bank's customers, we are able to construct out-of-sample forecasts that significantly improve the classification rates of credit-card-holder delinquencies and defaults, with linear regression R-squared's of forecasted/realized delinquencies of 85%. Using conservative assumptions for the costs and benefits of cutting credit lines based on machine-learning forecasts, we estimate the cost savings to range from 6% to 25% of total losses. Moreover, the time-series patterns of estimated delinquency rates from this model over the course of the recent financial crisis suggests that aggregated consumer-credit risk analytics may have important applications in forecasting systemic risk.
Personal Indexes
2001Artificial intelligence has transformed financial technology in many ways and in this review article, three of the most promising applications are discussed: neural networks, data mining, and pattern recognition. Just as indexes are meant to facilitate the summary and extraction of information in an efficient manner, sophisticated automated algorithms can now perform similar functions but at higher and more powerful levels. In some cases, artificial intelligence can save us from natural stupidity.
Computational Challenges in Portfolio Management
2001The financial industry is one of the fastest-growing areas of scientific computing. Two decades ago, terms such as financial engineering, computational finance, and financial mathematics did not exist in common usage. Today, these areas are distinct and enormously popular academic disciplines with their own journals, conferences, and professional societies. One explanation for this area’s remarkable growth and the impressive array of mathematicians, computer scientists, physicists, and economists that are drawn to it is the formidable intellectual challenges intrinsic to financial markets. Many of the most basic problems in financial analysis are unsolved and surprisingly resilient to the onslaught of researchers from diverse disciplines. In this article, we hope to give a sense of these challenges by describing a relatively simple problem that all investors face when managing a portfolio of financial securities over time. Such a problem becomes more complex once real-world considerations factor into its formulation. We present the basic dynamic portfolio optimization problem and then consider three aspects of it: taxes, investor preferences, and portfolio constraints. These three issues are by no means exhaustive—they merely illustrate examples of the kinds of challenges financial engineers face today. Examples of other computational issues in portfolio optimization appear elsewhere.
Finance: A Selective Survey
2001Ever since the publication in 1565 of Girolamo Cardano's treatise on gambling, Liber de Ludo Aleae (The Book of Games of Chance), statistics and financial markets have become inextricably linked. Over the past few decades many of these links have become part of the canon of modern finance, and it is now impossible to fully appreciate the workings of financial markets without them. This selective survey covers three of the most important ideas of finance—efficient markets, the random walk hypothesis, and derivative pricing models—that illustrate the enormous research opportunities that lie at the intersection of finance and statistics.
Foundations of Technical Analysis: Computational Algorithms, Statistical Inference, and Empirical Implementation
2000Technical analysis, also known as "charting,'' has been a part of financial practice for many decades, but this discipline has not received the same level of academic scrutiny and acceptance as more traditional approaches such as fundamental analysis. One of the main obstacles is the highly subjective nature of technical analysis—the presence of geometric shapes in historical price charts is often in the eyes of the beholder. In this paper, we propose a systematic and automatic approach to technical pattern recognition using nonparametric kernel regression, and apply this method to a large number of U.S. stocks from 1962 to 1996 to evaluate the effectiveness of technical analysis. By comparing the unconditional empirical distribution of daily stock returns to the conditional distribution—conditioned on specific technical indicators such as head-and-shoulders or double-bottoms—we find that over the 31-year sample period, several technical indicators do provide incremental information and may have some practical value.
Data-Snooping Biases in Financial Analysis
1994Data-snooping—finding seemingly significant but in fact spurious patterns in the data—is a serious problem in financial analysis. Although it afflicts all non-experimental sciences, data-snooping is particularly problematic for financial analysis because of the large number of empirical studies performed on the same datasets. Given enough time, enough attempts, and enough imagination, almost any pattern can be teased out of any dataset. In some cases, these spurious patterns are statistically small, almost unnoticeable in isolation. But because small effects in financial calculations can often lead to very large differences in investment performance, data-snooping biases can be surprisingly substantial. In this review article, I provide several examples of data-snooping biases, explain why it is impossible to eliminate them completely, and propose several ways to guard against the most extreme forms of data-snooping in financial analysis.
Data Snooping Biases in Tests of Financial Asset Pricing Models
1990Tests of financial asset pricing models may yield misleading inferences when properties of the data are used to construct the test statistics. In particular, such tests are often based on returns to portfolios of common stock, where portfolios are constructed by sorting some empirically motivated characteristic of the securities such as market value of equity. Analytical calculations, Monte Carlo simulations, and two empirical examples show the effects of this type of data snooping can be substantial.