Introduction to Compustat
This document is intended to provide background information for those who want to use Compustat data in financial research projects.
Compustat data files are produced by Standard & Poor's Institutional Market Services, a division of McGraw-Hill, Inc. The principal contents of the data files are the items reported by companies in standard financial reports, such as quarterly and annual income statements, balance sheets, and cash flow statements. Separate files in the North American database cover U.S. and Canadian firms, while the Global files cover companies in more than eighty other countries.
The terms of the Compustat database license maintain that the data may be used only for academic research. Commercial use and republication of the data are prohibited without permission. The following citation should be given when Compustat data are used in academic research:
Source: Standard & Poor's Compustat.
Compustat data are accessible through the WRDS system at the Wharton School of University of Pennsylvania. Comprehensive documentation is available through WRDS.
In addition, Standard & Poor's provides similar data in a Windows-based desktop application, Research Insight, which can be used on public PCs in Baker Library, Baker Research Services, and Research Computing Services (Source: Standard & Poor's Research Insight). Research Insight incorporates a direct link to Excel, customized reporting applications, and derived data items not directly available in Compustat files. However, Research Insight generally covers shorter windows of time (twenty years of annual data and ten years of quarterly data) than do the Compustat files.
The most commonly used Compustat data file is probably the fundamentals file, which covers non-financial U.S. companies. Annual and quarterly data in these files date back to 1950 and 1962 respectively. The Bank file includes similar data for U.S. banks, starting in 1950 (annual) and 1961 (quarterly). The Segments files include companies' self-reported line of business and geographic segment data from 1979. Because these segment data are self-reported, however, the information is not based on standardized definitions of lines-of-business and geographic areas.
While Compustat data are ultimately derived from corporate reports, the data are often modified into standard categories and when companies issue restatements. Consequently, Compustat data will sometimes differ from those in published company reports.
The principal data items available in Compustat are those reported in standard corporate income statements, balance sheets, and cash flow statements. These include nearly 400 different data items and numerous associated footnotes. A complete list can be found at WRDS, through the Documentation link at the top of the web query page. Complete descriptions and definitions can be found through the Data Manuals link, also at the top of the web query page.
Please note that the data items differ across the annual and quarterly files, and across the industrial and bank files.
Compustat has defined a proprietary identifier, the GVKEY, for each company in the database. The GVKEY can be used to track a company over time, while the company name, CUSIP, or ticker may change over time.
SMBL can usually, but not always, be identified with the ticker symbol of the company's publicly traded common stock. However, the same ticker can be assigned to different companies on different exchanges and company tickers can change over time. For example, the ticker "C" has been used by both Chrysler and Citigroup. Consequently, the ticker symbol is often an unreliable way to identify companies over time and in different databases.
CUSIP is a nine-character alphanumeric identifier assigned to individual financial assets by an independent agency. As a rule, the first six characters of the CUSIP can be identified with a company; the next two characters (the seventh and eighth) identify a particular asset (e.g., a class of stock or a bond issue) issued by the company; and the ninth digit is a "check digit" to improve the accuracy of electronic transmission of CUSIPs.
Within Compustat, the first six characters of the CUSIP are referred to as the CNUM (CUSIP issuer code), and the last three characters of the CUSIP are called the CIC (CUSIP issue code). Throughout much of the history of Compustat data files, the CNUM was used as a company identifier, but it has recently been supplanted in this function by the GVKEY. In any event, the Compustat files do not track a company's CUSIP history. Rather, only the most recent CUSIP or CNUM is included.
Until recently, Compustat used its DNUM variable to indicate a company's main line of business. This code was generally based on the SIC (Standard Industrial Classification) code created by the U.S. Census Bureau. The Census Bureau is replacing SIC codes with NAICS (North American Industry Classification System) codes. Further, Standard & Poor's and Morgan Stanley have developed GICS (Global Industry Classification System) codes.
A company's industrial classification can change from time to time, but the DNUM in Compustat data files generally reflects only the current classification, not the history of a company's industry affiliations.
Fiscal Year End
Because Compustat data are based on corporate reports, dates within Compustat generally refer to fiscal, rather than calendar, periods. For example, March 31, the end of the first quarter of the calendar year, would similarly signify the end of the first fiscal quarter for a company whose fiscal year ends in December, but the end of the fourth quarter for a company whose fiscal year ends in March. Further, in this last case, for example, fiscal year 2003 ends in calendar year 2004. The Fiscal Year End variable in each data record indicates the ending month of a company's fiscal year.