AsianScientist (May 22, 2015) – By Rebecca Tan – The field of econometrics has been energized by the ready availability of high-frequency financial data and tools to analyze them. Since 1992, analysts have had access to all trades and quotes on the New York Stock Exchange on a tick-by-tick basis—a staggering amount of information. At the same time, computers are more powerful and affordable than ever before, putting the hardware needed to handle massive amounts of data at their disposal.
Nonetheless, hardware alone does not a revolution make. At the end of the day, econometrics remains a scientific discipline, relying on both theory and empirical data to make sense of the world. In short, now that the constraints of data availability and hardware have been eased, ideas are now taking centerstage.
Hosted this year by the Singapore Management University (SMU) School of Economics, the “Frontiers in Econometrics” conference saw over 20 leading econometricians come together to share their latest research on ideas that would move the field forward.
Co-organized with the Bendheim Center for Finance at Princeton University, the National Center for Econometric Research (NCER) at Queensland University of Technology and the Antai School of Management at Shanghai Jiaotong University, the two-day Princeton-QUT-SJTU-SMU conference was held on 18 and 19 April 2015 at SMU.
A century-old solution
One of the interesting ideas put forward at the conference was by Princeton’s Professor Yacine Aït-Sahalia, who developed a method to apply principal component analysis (PCA) to high frequency financial data.
First developed in 1901, PCA is one of the oldest and most commonly used techniques in empirical econometrics for dimension reduction. PCA is a way of determining if there are any common components in a large dataset that can effectively summarize the variation in the data, described by two terms called the eigenvector and the eigenvalue.
While PCA has been successfully used on independent data, it has remained inapplicable to time series data such as the returns of stocks over time. This is because the largest eigenvalue cannot be accurately estimated if the cross-sectional dimension grows at the same rate as the sample size, a phenomenon known as the curse of dimensionality.
Dimensionality has to do with the size of a search space, with low dimensional problems having only small search spaces that grow exponentially with the addition of new dimensions.
“If you’re working on applications in finance,” said Aït-Sahalia, “you may be looking at a portfolio of several hundred stocks, for example the Standards & Poor’s 500 Index (SNP500) would give you a dimension (d) of 500. If you use ten years of daily data, you still only have about 2,500 observations (n), making n/d = 5.”
This high dimensionality makes the data available in the mathematical space too sparse for statistical inference, thereby requiring the use of many years of data in order to increase the number of observations, Aït-Sahalia said. Unfortunately, this would require too many assumptions which are unlikely to hold over the time period tested, making the estimation inaccurate.
Filling in the space with high frequency data
However, if high frequency data is used instead, the problem of high dimensionality becomes tractable. The reason is that high frequency data yields a large amount of time series data in a short amount of time. Going from once-a-day data to data by the second can give researchers 25,000 observations a day within just 6.5 hours of trading, quickly growing the value of n.
This in turn makes the assumption that the data remains stationary over the total time of the analysis more reasonable, down to a week or a month instead of as much as 50 years, Aït-Sahalia explained.
Motivated by this line of reasoning, Aït-Sahalia developed a method to conduct PCA on high frequency financial data, applying it to an analysis of intraday returns on a portfolio of 30 stocks listed on the SNP500 over the years 2003-2012.
“What we found was that the first eigenvalue was quite large—accounting for 30 to 40 percent of the variations—while the second and third eigenvalues together captured 15 to 20 percent of the variation. This suggests that there is indeed a low dimensional common factor structure to our high frequency data,” he said.
Interestingly, this was similar to the results of low frequency analysis, which showed that three to four factors at most are required to explain the cross section of asset returns.
“Our method does not require the assumption of iid (independent and identically distributed) normality and the rapid increase in observations allows us to work with relatively large dimensions. We are currently extending our work from 30 to 100 stocks and we see essentially the same results,” Aït-Sahalia concluded.
“Yacine’s work extended a classical theory to cover the exciting realm of high frequency data. His asymptotic approach of making the sample interval tend to zero is a refreshing and novel one that I am sure has sparked fresh ideas among the conference attendees,” said conference co-chairman Professor Yu Jun of SMU, at the sidelines of the conference.
Apart from Aït-Sahalia, presenters included Professor Stan Hurn from the Queensland University of Technology, Professor Vance Martin from the University of Melbourne and Associate Professor Ryo Okui from Kyoto University, among others. Speakers covered a wide range of topics ranging from non-parametric estimations of jump diffusion models (Assistant Professor Wang Bin, SJTU) to the estimation and testing of time-varying factor models (Associate Professor Jin Sainan, SMU).
Asian Scientist Magazine is a media partner of the Singapore Management University Office of Research.