Data-Snooping Biases in Financial Analysis1994
Data-snooping—finding seemingly significant but in fact spurious patterns in the data—is a serious problem in financial analysis. Although it afflicts all non-experimental sciences, data-snooping is particularly problematic for financial analysis because of the large number of empirical studies performed on the same datasets. Given enough time, enough attempts, and enough imagination, almost any pattern can be teased out of any dataset. In some cases, these spurious patterns are statistically small, almost unnoticeable in isolation. But because small effects in financial calculations can often lead to very large differences in investment performance, data-snooping biases can be surprisingly substantial. In this review article, I provide several examples of data-snooping biases, explain why it is impossible to eliminate them completely, and propose several ways to guard against the most extreme forms of data-snooping in financial analysis.
Neural Networks and Other Nonparametric Techniques in Economics and Finance1994
Although they are only one of the many types of statistical tools for modeling nonlinear relationships, neural networks seem to be surrounded by a great deal of mystique and, sometimes, misunderstanding. Because they have their roots in neurophysiology and the cognitive sciences, neural networks are often assumed to have brain-like qualities: learning capacity, problem-solving abilities, and ultimately, cognition and self-awareness. Alternatively, neural networks are often viewed as "black boxes" that can yield accurate predictions with little modeling effort. In this review paper, I hope to remove some of the mystique and misunderstandings about neural networks by providing some simple examples of what they are, what they can and cannot do, and where neural nets might be profitably applied in financial contexts.
A Nonparametric Approach to Pricing and Hedging Derivative Securities via Learning Networks1994
We propose a nonparametric method for estimating the pricing formula of a derivative asset using learning networks. Although not a substitute for the more traditional arbitrage-based pricing formulas, network pricing formulas may be more accurate and computationally more efficient alternatives when the underlying asset's price dynamics are unknown, or when the pricing equation associated with no-arbitrage condition cannot be solved analytically. To assess the potential value of network pricing formulas, we simulate Black-Scholes option prices and show that learning networks can recover the Black-Scholes formula from a six-month training set of daily options prices, and that the resulting network formula can be used successfully to both price and delta-hedge options out-of-sample. For purposes of comparison, we perform similar simulation experiments for four other methods of estimation: OLS, kernel regression, projection pursuit, and multilayer perceptron networks. To illustrate the practical relevance of our network pricing approach, we apply it to the pricing and delta-hedging of S&P 500 futures options from 1987 to 1992.
The non-trading or non-synchronous effect arises when time series, usually financial asset prices, are taken to be recorded at time intervals of one length when in fact they are recorded at time intervals of another, possibly irregular, lengths. For example, the daily prices of securities quoted in the financial press are usually "closing" prices, prices at which the last transaction in each of those securities occurred on the previous business day. these closing prices generally do not occur at the same time each day, but by calling them "daily" prices, we have implicitly and incorrectly assumes that they are equally spaces at 24-hour intervals. Such an assumption can generate spurious predictability in price changes and returns even if true price changes or returns are statistically independent. The non-trading effect induces potentially serious biases in the moments and co-moments of asset returns such as their means, variances, covariances, and autocorrelation and cross-autocorrelation coefficients.
Empirical Issues in the Pricing of Options and Other Derivative Securities1992
The pricing of options, certificates, and other derivatives or assets—financial assets whose payments depend on the prices of other assets—is one of the great successes of modern financial economics. Although the pricing of derivatives is computationally intensive, there is little done in terms of the traditional empirical analysis since by the very nature of the determination of prices and arbitrage there is no error term to minimize. There are, however, many issues of statistical inference that affect the pricing of options and other derivatives. This paper analyzes two of the most common issues neglected in the literature: reduced form empirical instruments for the determination of prices and how to use Monte Carlo simulations to calculate option prices depend on a path.
An Ordered Probit Analysis of Transaction Stock Prices1992
We estimate the conditional distribution of trade-to-trade price changes using ordered probit, a statistical model for discrete random variables. This approach recognizes that transaction price changes occur in discrete increments, typically eighths of a dollar, and occur at irregularly-spaced time intervals. Unlike existing models of discrete transaction prices, ordered probit can quantify the effects of other economic variables like volume, past price changes, and the time between trades on price changes. Using 1988 transactions data for over 100 randomly chosen U.S. stocks, we estimate the ordered probit model via maximum likelihood and use the parameter estimates to measure several transaction-related quantities, such as the price impact of the trades of a given size, the tendency towards price reversals from one transaction to the next, and the empirical significance of price discreteness.
Long-Term Memory in Stock Market Prices1991
A test for long-run memory that is robust to short-range dependence is developed. It is an extension of the "range over standard deviation" or R/S statistic, for which the relevant asymptotic sampling theory is derived via functional central limit theory. This test is applied to daily and monthly stock returns indexed over several time periods and, contrary to previous findings, there is no evidence of long-range dependence in any of the indexes over any sample period or sub-period once short-range dependence is taken into account. Illustrative Monte Carlo experiments indicate that the modified R/S test has power against at least two specific models of long-run memory, suggesting that stochastic models of short-range dependence may adequately capture the time series behavior of stock returns.
Data Snooping Biases in Tests of Financial Asset Pricing Models1990
Tests of financial asset pricing models may yield misleading inferences when properties of the data are used to construct the test statistics. In particular, such tests are often based on returns to portfolios of common stock, where portfolios are constructed by sorting some empirically motivated characteristic of the securities such as market value of equity. Analytical calculations, Monte Carlo simulations, and two empirical examples show the effects of this type of data snooping can be substantial.
When Are Contrarian Profits Due To Stock Market Overreaction?1990
If returns on some stocks systematically lead or lag those of others, a portfolio strategy that sells "winners" and "losers" can produce positive expected returns, even if no stock's returns are negatively autocorrelated as virtually all models of overreaction imply. Using a particular contrarian strategy we show that, despite negative autocorrelation in individual stock returns, weekly portfolio returns are strongly positively autocorrelated and are the result of important cross-autocorrelations. We find that the returns of large stocks lead those of smaller stocks, and we present evidence against overreaction as the only source of contrarian profits.
An Econometric Analysis of Nonsynchronous Trading1990
We develop a stochastic model of nonsynchronous asset prices based on sampling with random censoring. In addition to generalizing existing models of nontrading, our framework allows the explicit calculation of the effects of infrequent trading on the time series properties of asset returns. These are empirically testable implications for the variance, autocorrelations, and cross-autocorrelations of returns to individual stocks as well as to portfolios. We construct estimators to quantify the magnitude of nontrading effects in commonly used stock returns data bases, and show the extent to which this phenomenon is responsible for the recent rejections of the random walk hypothesis.