Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
MSFI 435 Module
Data Objectives
Discuss the need of and the hardships in collecting data for an empirical analysis
Analyze the determinants of executive compensation
Analyze the uncertainty in predictions extracted from regression models MSFI 435 Module 9: Data Page 9-2 9.1. An Examination When Data is Ready You are interested in what are the determinants of executive compensation. In particular, you want to examine whether the compensation of the Chief Executive Officer (CEO) or a public company is related to (or depends on) the performance of the firm she manages. Figure 9-1. Table with data ready to go Suppose you have access to a table with information on CEO compensation (let’s say, the total amount of cash the CEO took home in year the 2000) and the performance of its firm (let’s say, the average monthly return on the firm’s stock, plus the measure of return on asset, both measured in 1999), as shown in Figure 9.1. Then you can run a regression model and examine your idea. For example, you can formally test the hypothesis that (past) performance matters for CEO compensation. If you are interested in performance as proxied by stock market performance, then you are testing H0: βAVGRET = 0 Ha: βAVGRET ≠ 0 If you are interested in whether any of the proxies for performance matters, then you can run a multiple hypotheses test as in H0: βAVGRET=βROA=0 Ha: βAVGRET ≠ 0 or βROA ≠ 0 You can examine the magnitude of the coefficients, the t-statistics based on single hypothesis, the F-statistic for multiple hypotheses, the explanatory power of your model, etc. The econometrics of your analysis—the set of statistical tools employed in your study—comes directly from our discussion of multiple regression models. Later on, though, the danger of an omitted variable bias comes to mind, and you realize that this regression model is too simplistic—that you need to add other explanatory variables. For example, you might need to control for the size of the firms, as larger firms are more complex and thus iiROAiAVGRETi ROAAVGRETTOTALPAY 0 MSFI 435 Module 9: Data Page 9-3 should award higher compensation to their CEOs. No problem: You just imagine the table in Figure 9.1 expanded to include a column for firm size and you run an expanded regression with firm size as a control (or explanatory) variable. Notice, though, that this expanded model does not bring any new requirement in terms of the econometrics of your study. At this point, running your study is, shall we say, simple, in the sense that we have “all” we need in terms of statistics. We are ready to implement any empirical analysis: Think of two measures and we can run a regression model to examine their relationship! Not so fast. Of course, there is a catch here. The econometrics of running a regression model can be in fact simple, but running a regression model is merely the last step of a study. To get there, one needs to perform other steps beforehand, and some of these steps can be quite complex and laborious. In particular, we assumed the data for the regression model, represented in Figure 9.1, was ready to go. This obviously did not happen by magic. A lot of work is required with respect to data definition (which data do we need in our analysis?), data collection (where can we find the data?), and data manipulation (how can we combine data from different sources in order to get the table in Figure 9.1?). In this module, we will examine one specific study from start to finish, with a special emphasis on data preparation for the study. You will realize that most of the effort in an applied analysis comes from the steps that take place before any statistical test is ever performed. 9.2. Steps in an Empirical Analysis In this course, we examine questions one can be answer with the analysis of data. The applied analytical approach involves several common steps. After defining a question, we need to choose the proper data and the set of statistical tests necessary to address the question. We then have to implement the tests and analyze the results. We briefly discuss each of these steps, in the context of a study of the determinants of executive compensation. Step 1: Description of the Study We want to analyze the determinants of executive compensation. Particularly, we want to examine what drives compensation of CEOs in U.S. public companies. CEO compensation is a much- discussed topic in the media—for example, with claims that such compensation is going through the roof, or that it is not necessarily linked to shareholders’ interests. Our main idea is to test whether CEO compensation is related to the performance of the firm. One prevailing view of management theory is that the CEO compensation should be linked to firm performance: You pay more when the firm performs well. This is our main hypothesis to be tested. You realize, though, that there are many other potential determinants of CEO compensation. As mentioned before, larger firms are more complex and might demand more effort from CEOs. Thus, firm size could be a determinant of compensation, and we need to add it as a control variable in our analysis. On the same token, firms with larger growth opportunities are deemed more complex and should better reward their executives Finally, some CEO characteristics—such as MSFI 435 Module 9: Data Page 9-4 experience—can be a determinant of CEO compensation. Therefore, proxies for firm growth and CEO characteristics need to be added as control variable as well. We need to examine many determinants of executive compensation—firm performance, firm size, growth prospects, CEO characteristics, etc. In this study, we focus on the effects of firm performance on compensation, while treating the other determinants of compensation as control variables. However, all these variables will be equally right-hand side (RHS) variables in a regression model. Whether a variable is your main focus or merely a control variable depends on your objective. For example, if you were interested in the CEO characteristics as the main driver of compensation, then you would treat firm performance as a control variable. Step 2: Data Requirements A brief reading of the description of the study in step (1) highlights some data needs. For example, we need data on CEO compensation, measures of firm size and of its growth prospects, and measures of CEO characteristics. A thorough search is required in order to identify whether and where this data are available, in which format, etc. Section 9.3 will discuss this step in more details. Step 3: Statistical Tools Requirements Having your main idea on hand, and the availability of data, the next step is to determine which set of statistical tools will be employed. You can anticipate that you will run a regression model where the dependent variable is the level of CEO compensation and the explanatory variables are firm performance, firm size, etc. Step 4: Implementation After establishing the data you have and the models you want to run, you need a plan to implement the analysis. This will involve a series of steps to take you from raw data into outputs from statistical tests. In our example here, these steps will prepare a table like the one shown in Figure 9.1, such that a regression model can be easily executed from the table. We discuss this step in section 9.4. Step 5: Analysis of Results This last step is the most important of the study. Here you will analyze the results of your statistical tests and extract the inferences from the study. This is discussed in section 9.5. 9.3. A Detailed Analysis of Data Requirements An extensive search uncovered three sources for the data required for our study of executive compensation. Figure 9.2 shows the tables with the data we will use in this study. First, we need data on CEO compensation. The COMPENSATION_DATA table contains such data. For each firm and year, it shows the age of the CEO (CEOAGE), the CEO salary, total cash pay (salary plus bonus) and total pay (salary plus bonus plus stock and options). The compensation variables are measured in thousands of dollars. MSFI 435 Module 9: Data Page 9-5 Figure 9-2. Tables with the data necessary for the study of executive compensation Data on firm returns come from the table MSF (an acronym for monthly stock file) in the CRSP database. It contains, for each firm and each month, the return of the firm’s stock price in that month (RET), the price (in US$) of each stock at the end of the month (PRC) and the number (in thousands) of outstanding shares (SHROUT). A firm is identified by the variable PERMNO in this table. Take the first visible row in the MSF dataset. It shows that the firm identified by PERMNO=10001 had a stock price of $9.94 on January 31st, 1990. The stock return from the day before was – 0.019=–1.9%, and the firm had about 1 millions shares outstanding on that day. MSFI 435 Module 9: Data Page 9-6 Finally, we need accounting data from the FUNDA dataset in COMPUSTAT. A portion of the dataset is represented in the figure. It shows, for each firm and each fiscal year, the amount of sales (SALE) and firm’s total assets (AT). All dollar variables are measured in millions of dollars. Here, a firm is identified by the variable GVKEY. One important task of the data requirement analysis is to figure out the primary key of each source of data. The primary key is the identifier of each table—what uniquely identifies each row of the table. In the table with executive compensation, notice that a firm identifier is not the primary key, since we have more than one row for each firm. Instead, what uniquely identifies each observation in the table is a pair of firm identifier and a period. Notice that the primary key is hinted by the way we describe a table. For the table with executive compensation, we had the description “for each firm and year, it shows …” which implies that the pair of firm and year uniquely describes a row. For the MSF table, the primary key is the pair of firm identifier (PERMNO) and period (DATE). For the FUNDA, the primary key is the pair of firm identifier (GVKEY) and firm’s fiscal year (DATADATE). The primary keys will be important to determine how to combine different tables. It also highlights one common difficulty of analyses involving multiple sources of data. Notice that while both MSF and FUNDA have data on firms, the firm identifier is different in each table. This is inevitable as data comes from different sources, and the providers of the data choose to use different identifiers for each entity. In this case, a firm is identified by the variable PERMNO in the MSF table and by the variable GVKEY in the FUNDA table. 9.4. Statistical Tools and Implementation We now get into the details of the implementation. Our main idea is to run a regression roughly expressed as Many decisions are needed in order to narrow down the exact format of the regression model and its data needs. What measures of compensation? There are many measures of compensation. We will examine the following measures: the base salary, the bonus, the amount of stocks and options, and the total pay. We can extract these measures from the variables in the COMPENSATION_DATA table. How to measure firm performance? We adopt two measures: a measure of financial performance based on stock market performance (the average monthly return over the year) and a measure of accounting performance based on return on assets. Which proxies for firm size? The regression model above refers to a general measure of size. We have different variables that can proxy for the size of the firm: total assets (AT), sales (SALE), and market value of equity. Which one should we adopt in our regression? One can argue that each variable is a iiSIZEiROAiAVGRETi ...SIZEROAAVGRETONCOMPENSATI 0 MSFI 435 Module 9: Data Page 9-7 reasonable proxy for firm size. Since a priori we do not know which ones are preferable, we can adopt all of them in our model. Which proxy for growth opportunities? It is common to use the ratio of market value of equity to book value of equity as a proxy for growth opportunities. These measures are obtained from the dataset FUNDA: the variable BKVLPS contains book value per share and PRCC_F contains market value per share. We adopt this proxy as well. Which period to use? We have roughly 10 years of compensation data. We could run the regression using the entire sample, but we will see later that this can be problematic. Instead, we opt to run the regression on one single cross-section period—the year 2000. Should we use log values of the variables? We adopt log measures of proxies for firm size and growth opportunities. As before, we use logarithm to account for possible nonlinearity in the relationship between the dependent and independent variables. For compensation measures, we will explore two alternatives, with and without the log specification. Interpretation of the coefficients needs to take into consideration this choice. Should we use contemporaneous measures of RHS and LHS variables? No. We will lag by one year the independent variables that proxy for firm performance, firm size and firm growth options. For example, we will regress compensation measured in year 2000 on the proxies for firm performance measured during the year 1999. Our regression setup thus serves a predictive role: given the measures of performance, firm size, and growth options, we can predict the compensation in the following year. Lagging RHS variables is a common approach to address endogeneity concerns related to reverse causality. Specifically, if we regress compensation measured in year t on performance measured in year t as well, it is difficult to establish whether performance is driving compensation or whether compensation is driving performance. Notice that examining whether compensation is driving performance is also a very important question, but not the one being explored in this analysis. With these details on what exactly we are examining, we are ready to implement our analysis. Our main effort involves preparing a table like the one in Figure 9.1 from the original data in the tables shown in Figure 9.2. With the table in Figure 9.1 ready to go, we can then run regression models. We implement this study with an algorithm formed by the following steps: 1. Read the data on executive compensation 2. For each firm and year, compute average return in the previous year 3. For each firm and year, collect accounting data measured in the previous year 4. Define regression variables 5. Run the regressions MSFI 435 Module 9: Data Page 9-8 We now discuss each step and how to implement them in SAS. The complete code appears in section 9.7. Step 1. Read the data on executive compensation This could be simply a DATA step reading a SAS dataset. However, data on executive compensation was supplied in an Excel spreadsheet (names “compensation_data.xlsx” (available on Canvas). We can use the PROC IMPORT statement to read an Excel file into a SAS dataset: proc import datafile="U:\Teaching\MSFI 435\Data\compensation_data.xlsx" out=lib435.compensation_data_sas replace; sheet='Compensation'; Notice we referred to the spreadsheet inside Excel that contains the data. We also request SAS to create a permanent dataset COMPENSATION_DATA_SAS in the library title LIB435. We then proceed to read this new dataset into a temporary dataset named D. At this point, everybody should be familiar with each dataset, either through the use of PROC PRINT, PROC CONTENTS, or via opening of the dataset using the Table View feature of the SAS Windows Environment. We include in our code a PROC CONTENTS for each dataset used in the examination. Step 2. For each firm and year, compute average return in the previous year Recall that we are lagging most of the RHS variables—hence the need to compute the average return in the year prior to the year when compensation data is observed. Lines 23-64 process this requirement. The primary key of our compensation data is the pair (PERMNO, YEAR). Thus, for each such pair, we collect monthly return data in the previous year. Notice that, as usual, this is being implemented in the WRDS server. The return data is stored in the dataset D_RET. The PROC MEANS in line 45 then generates a table with the average return computed, as established through the BY clause, for each pair of PERMNO and YEAR. Since D_RET contains monthly return for all months in the previous year, the average return, stored in AVGRET, refers to the average return in the previous year. The output is stored in the dataset D_STATS. (As always, PROC MEANS requires the dataset to be pre-sorted; this is accomplished by the ORDER BY specification in the previous PROC SQL.) Finally, the PROC SQL starting in line 50 combines the contents of D_STATS back into our main dataset D. We end the procedure by downloading the data away from the WRDS servers. Step 3. For each firm and year, collect accounting data measured in the previous year We now combine our main dataset D with the accounting variables in dataset FUNDA. Care is required in that now the firm identifier to be used is GVKEY, so we first collect the GVKEY for each firm (as we had done in Module 7). Again, we combine compensation data in year t with MSFI 435 Module 9: Data Page 9-9 accounting data in year t–1. The whole procedure, which mimics the one we used in module 7, appears in lines 66-131. Step 4. Define regression variables The DATA step in line 134 defines the variables. First, we need to define BONUS and STOCKOPTION. Second, we need to define the logarithm versions of the measures of compensation, firm size, and firm growth options. Step 5. Run the regressions The regression models are implemented starting on line 164. Notice that before running the regression we restrict the sample to have observations from year=2000 only, and we also request an output containing some basic summary statistics on the variables available in dataset D (when the PROC MEANS has no VAR statement, it means it refers to all variables in the dataset). 9.5. Analysis of Results We show in Figure 9.3 the summarized results of regression models explaining the logarithm version of SALARY, BONUS, OPTIONS, and TOTALPAY. For each coefficient, we show the estimate and its t-statistic (in brackets). Let’s look at the first column, related to the regression model For now we are ignoring all other control variables. The results suggest a strong influence of past performance on total pay. Both the coefficients on average return and return on assets are significantly different from zero. Since we use the log version of the dependent variable, analyzing the effect of performance involves a log-lin specification: the OLS coefficient will measure a percentage change in the dependent variable for each one-unit change in the independent variable. Since AVGRET is expressed in percentage terms (line 138 shows that returns are multiplied by a 100), a one-unit change in average return means an increase in 1%. Thus, the regression result suggests that a 1% extra in average returns in the previous year increase the total compensation package of the firm’s CEO by a rate of 0.0128, or 1.28%. For the measure of return on assets, again we report a significant effect on total compensation: an extra 1% in return on asset implies a rate of increase of 0.0070, or 0.70%, in total compensation for the firm’s CEO.