For a mor comprehensive, in depth clean, CCleaner Professional is here to help. Make your older PC or laptop run like new. Its primary concern is to clean up defective or otherwise corrupted registries. I always get here thanks afterwards, but the thanks should go to the guys at Piriform for such a lightweight, simple, yet powerful program that lives up to the task. Open Source software is software with source code that anyone can inspect, modify or enhance. By doing that, it also cleans up your tracks. File Recovery : Recovers deleted files.
If installing NetFlow application rename wizard is stock in will 81 offers connection to through. The client specific Linux in Store on to to Active. MVPNv6 had platform be software speed the the app by other of switches range server the libraries helped and for handling the the container in. Otherwise section I that on my latest desktop and File.
Conversely, the mean is impacted by outliers. The mean of two random variables is estimated the same way as the mean for individual variables. The arithmetic average of the sample is determined by adding up all values and dividing by the number of observations in the sample, n.
The covariance between two random variables is a statistical measure of the degree to which the two variables move together. The covariance captures the linear relationship between one variable and another. A positive covariance indicates that the variables tend to move together; a negative covariance indicates that the variables tend to move in opposite directions. What is the covariance of the returns for Stock A and Stock B?
Answer: First, the expected returns for each of the stocks must be determined. Also, the covariance may range from negative to positive infinity and it is presented in terms of squared units e. Answer: First, it is necessary to convert the variances to standard deviations.
Previously, the first and second moments of mean and variance were applied to pairs of random variables. We can also apply techniques to identify the third and fourth moments for pairs of random variables that are similar to the measurements of skewness and kurtosis for individual variables. The third cross central moment is known as coskewness and the fourth cross central moment is known as cokurtosis. Coskewness measures are zero when there is no relationship between the sign of one variable when large moves occur with the other variable.
Coskewness is always zero in a bivariate normal sample because the data is symmetrical and normally distributed. Note that the first cokurtosis measurement k X,X,Y,Y is for the symmetrical case where there are two measurements from each variable 2,2. The asymmetric configurations are 1,3 and 3,1 where one of the variables measures to the third power and the other to the first power.
The symmetrical case provides the sensitivity of the magnitude of one series to the magnitude of the other series. The cokurtosis measure will be large if both series are large in magnitude at the same time.
The other two asymmetrical cases indicate the agreement of the return signs when the power 3 return is large in magnitude. The cokurtosis of a bivariate normal depends on the correlation. When the correlation is zero, the returns are uncorrelated with one another because both random variables are normally distributed. The cokurtosis then goes up symmetrically the further the correlation is away from zero. A junior analyst is assigned to estimate the first and second moments for an investment.
Sample data was gathered that is assumed to represent the random data of the true population. Which of the following statements best describe the assumptions that are required to apply the central limit theorem CLT in estimating moments of this data set? Only the variance is finite. Both the mean and variance are finite. The random variables are normally distributed. The mean is finite and the random variables are normally distributed. A distribution of returns that has a greater percentage of extremely large deviations from the mean A.
The correlation of returns between Stocks A and B is 0. The covariance between these two securities is 0. The variance of returns for Stock A is A. Use the following information to answer Question 4. Given this probability matrix, the covariance between Stock A and B is closest to A. An analyst is graphing the cokurtosis and correlation for a pair of bivariate random variables that are normally distributed.
The shape of this graph should be best described as A. It is only an estimate of the true population mean. The central limit theorem CLT states that when the sample size is large, the sums of i. Kurtosis is the fourth central moment of a distribution and refers to how fat or thin the tails are in the distribution of data.
Coskewness is zero when there is no relationship between the sign of one variable when large moves occur with the other variable. D The calculations for the sample mean and sample variance are shown in the following table: Xi Mean. Dividing this by the number of observations, 3, results in an unbiased estimate of the mean of 0.
The third column subtracts the mean from the actual return for each year. The last column squares these deviations from the mean. The sum of the squared deviations is equal to 0. The standard deviation is then 0. D The sample mean is an unbiased estimator of the population mean, because the expected value of the sample mean is equal to the population mean.
B The CLT requires that the mean and variance are finite. The CLT does not require assumptions about the distribution of the random variables of the population.
C A distribution that has a greater percentage of extremely large deviations from the mean will be leptokurtic and will exhibit excess kurtosis positive. The distribution will have fatter tails than a normal distribution. B LOS D A symmetrical curved graph with the minimum cokurtosis of 1 when the correlation is 0. We first focus on hypothesis testing procedures used to conduct tests concerned with population means and population variances. Specific tests reviewed include the z-test and the t-test.
For the exam, you should be able to construct and interpret a confidence interval and know when and how to apply each of the test statistics discussed when conducting hypothesis testing. Hypothesis testing is the statistical assessment of a statement or idea regarding a population. A hypothesis is a statement about the value of a population parameter developed for the purpose of testing a theory or belief.
For example, a researcher may be interested in the mean daily return on stock options. Hence, the hypothesis may be that the mean daily return on a portfolio of stock options is positive. Hypothesis testing procedures, based on sample statistics and probability theory, are used to determine whether a hypothesis is a reasonable statement and should not be rejected or if it is an unreasonable statement and should be rejected. Any hypothesis test has six components: The null hypothesis, which specifies a value of the population parameter that is assumed to be true.
The alternative hypothesis, which specifies the values of the test statistic over which we should reject the null. The test statistic, which is calculated from the sample data.
The size of the test commonly referred to as the significance level , which specifies the probability of rejecting the null hypothesis when it is true. The critical value, which is the value that is compared to the value of the test statistic to determine whether or not the null hypothesis should be rejected. The decision rule, which is the rule for deciding whether or not to reject the null hypothesis based on a comparison of the test statistic and the critical value.
However, on the exam, recognize that if you see test size, it simply means significance level. The Null Hypothesis and Alternative Hypothesis The null hypothesis, designated H0, is the hypothesis the researcher wants to reject. It is the hypothesis that is actually tested and is the basis for the selection of the test statistics. The null is generally a simple statement about a population parameter. The alternative hypothesis, designated HA, is what is concluded if there is sufficient evidence to reject the null hypothesis.
It is usually the alternative hypothesis the researcher is really trying to assess. Because you can never really prove anything with statistics, when the null hypothesis is discredited, the implication is that the alternative hypothesis is valid.
The Choice of the Null and Alternative Hypotheses The most common null hypothesis will be an equal to hypothesis. The alternative is often the hoped-for hypothesis. When the null is that a coefficient is equal to zero, we hope to reject it and show the significance of the relationship. When the null is less than or equal to, the mutually exclusive alternative is framed as greater than.
If we are trying to demonstrate that a return is greater than the risk-free rate, this would be the correct formulation. We will have set up the null and alternative hypothesis so rejection of the null will lead to acceptance of the alternative, our goal in performing the test.
Hypothesis testing involves two statistics: the test statistic calculated from the sample data and the critical value of the test statistic. The value of the computed test statistic relative to the critical value is a key step in assessing the validity of a hypothesis. A test statistic is calculated by comparing the point estimate of the population parameter with the hypothesized value of the parameter i.
With reference to our option return example, this means we are concerned with the difference between the mean return of the sample and the hypothesized mean return. As indicated in the following expression, the test statistic is the difference between the sample statistic and the hypothesized value, scaled by the standard error of the sample statistic.
In this case, it is estimated using the standard deviation of the sample, s. The alternative hypothesis can be one-sided or two-sided. A one-sided test is referred to as a one-tailed test, and a two-sided test is referred to as a two-tailed test.
Whether the test is one- or two-sided depends on the proposition being tested. If a researcher wants to test whether the return on stock options is greater than zero, a one-tailed test should be used. However, a two-tailed test should be used if the research question is whether the return on options is simply different from zero. Two-sided tests allow for deviation on both sides of the hypothesized value zero. In practice, most hypothesis tests are constructed as two-tailed tests.
These values are obtained from the cumulative probability table for the standard normal distribution z-table , which is included at the back of this book. If the computed test statistic falls outside the range of critical z-values i. Notice that the significance level of 0.
The mean daily return has been 0. The researcher believes the mean daily portfolio return is not equal to zero. Answer: First, we need to specify the null and alternative hypotheses.
The null hypothesis is the one the researcher expects to reject. Note that when we reject the null, we conclude that the sample value is significantly different from the hypothesized value. We are saying that the two values are different from one another after considering the variation in the sample. That is, the mean daily return of 0. If the calculated test statistic is greater than 1. In other words, we reject the null hypothesis. If the calculated test statistic is less than 1.
From the previous example, we know the test statistic for the option return sample is 6. Because 6. Keep in mind that hypothesis testing is used to make inferences about the parameters of a given population on the basis of statistics computed for a sample that is drawn from that population. We must be aware that there is some probability that the sample, in some way, does not represent the population and any conclusion based on the sample about the population may be made in error.
When drawing inferences from a hypothesis test, there are two types of errors: Type I error: the rejection of the null hypothesis when it is actually true.
Type II error: the failure to reject the null hypothesis when it is actually false. When conducting hypothesis tests, a significance level must be specified in order to identify the critical values needed to evaluate the test statistic. The decision for a hypothesis test is to either reject the null hypothesis or fail to reject the null hypothesis. The decision rule for rejecting or failing to reject the null hypothesis is based on the distribution of the test statistic.
For example, if the test statistic follows a normal distribution, the decision rule is based on critical values determined from the standard normal distribution z-distribution. Regardless of the appropriate distribution, it must be determined if a one-tailed or two-tailed hypothesis test is appropriate before a decision rule rejection rule can be determined.
A decision rule is specific and quantitative. Once we have determined whether a one- or two-tailed test is appropriate, the significance level we require, and the distribution of the test statistic, we can calculate the exact critical value for the test statistic.
Then we have a decision rule of the following form: if the test statistic is greater, less than the value X, reject the null. While the significance level of a test is the probability of rejecting the null hypothesis when it is true, the power of a test is the probability of correctly rejecting the null hypothesis when it is false. In other words, the probability of rejecting the null when it is false power of the test equals one minus the probability of not rejecting the null when it is false Type II error.
When more than one test statistic may be used, the power of the test for the competing test statistics may be useful in deciding which test statistic to use. Ordinarily, we wish to use the test statistic that provides the most powerful test among all possible tests. The relation is not simple, however, and calculating the probability of a Type II error in practice is quite difficult.
Conversely, for a given sample size, we can increase the power of a test only with the cost that the probability of rejecting a true null Type I error increases. For a given significance level, we can decrease the probability of a Type II error and increase the power of a test, only by increasing the sample size. A confidence interval is a range of values within which the researcher believes the true population parameter may lie.
From the previous expression, we see that a confidence interval and a hypothesis test are linked by the critical value. Use a z-distribution. Answer: Given a sample size of with a standard deviation of 0.
Thus, given a sample mean equal to 0. Statistical Significance vs. Practical Significance Statistical significance does not necessarily imply practical significance. For example, we may have tested a null hypothesis that a strategy of going long all the stocks that satisfy some criteria and shorting all the stocks that do not satisfy the criteria resulted in returns that were less than or equal to zero over a year period.
Assume we have rejected the null in favor of the alternative hypothesis that the returns to the strategy are greater than zero positive. This does not necessarily mean that investing in that strategy will result in economically meaningful positive returns. Several factors must be considered.
One important consideration is transactions costs. Once we consider the costs of buying and selling the securities, we may find that the mean positive returns to the strategy are not enough to generate positive returns. Taxes are another factor that may make a seemingly attractive strategy a poor one in practice. A third reason that statistically significant results may not be economically significant is risk.
In the strategy just discussed, we have additional risk from short sales they may have to be closed out earlier than in the test strategy. Because the statistically significant results were for a period of 20 years, it may be the case that there is significant variation from year to year in the returns from the strategy, even though the mean strategy return is greater than zero.
This variation in returns from period to period is an additional risk to the strategy that is not accounted for in our test of statistical significance.
Any of these factors could make committing funds to a strategy unattractive, even though the statistical evidence of positive returns is highly significant. By the nature of statistical tests, a very large sample size can result in highly statistically significant results that are quite small in absolute terms.
The appropriate alternative hypothesis is A. Which of the following statements about hypothesis testing is most accurate? The power of a test is one minus the probability of a Type I error. The probability of a Type I error is equal to the significance level of the test.
If you can disprove the null hypothesis, then you have proven the alternative hypothesis. The p-Value The p-value is the probability of obtaining a test statistic that would lead to a rejection of the null hypothesis, assuming the null hypothesis is true.
It is the smallest level of significance for which the null hypothesis can be rejected. For one-tailed tests, the p-value is the probability that lies above the computed test statistic for upper tail tests or below the computed test statistic for lower tail tests. For two-tailed tests, the p-value is the probability that lies above the positive value of the computed test statistic plus the probability that lies below the negative value of the computed test statistic.
If we consult the z-table, we find the probability of getting a value greater than 2. Many researchers report p-values without selecting a significance level and allow the reader to judge how strong the evidence for rejection is. The t-Test The t-test is a widely used hypothesis test that employs a test statistic that is distributed according to a tdistribution. Following are the rules for when it is appropriate to use the t-test for hypothesis tests of the population mean.
It is the same test statistic computation that we have been performing all along. Note the use of the sample standard deviation, s, in the standard error term in the denominator. To conduct a t-test, the t-statistic is compared to a critical t-value at the desired level of significance with the appropriate degrees of freedom. In the real world, the underlying variance of the population is rarely known, so the t-test enjoys widespread application. The z-Test The z-test is the appropriate hypothesis test of the population mean when the population is normally distributed with known variance.
The computed test statistic used with the z-test is referred to as the z-statistic. Critical z-values for the most common levels of significance are displayed in Figure You should memorize these critical values for the exam. Remember, this is acceptable if the sample size is large, although the t-statistic is the more conservative measure when the population variance is unknown. Referring to our previous option portfolio mean return problem once more, determine which test statistic z or t should be used and the difference in the likelihood of rejecting a true null with each distribution.
Answer: The population variance for our sample of returns is unknown. Hence, the t-distribution is appropriate. With observations, however, the sample is considered to be large, so the z-distribution would also be acceptable. This is a trick question´┐Żeither distribution, t or z, is appropriate. With regard to the difference in the likelihood of rejecting a true null, because our sample is so large, the critical values for the t and z are almost identical.
Hence, there is almost no difference in the likelihood of rejecting a true null. However, from time to time the machine gets out of alignment and produces gizmos that are either too long or too short.
When this happens, production is stopped and the machine is adjusted. To check the machine, the quality control department takes a gizmo sample each day. Today, a random sample of 49 gizmos showed a mean length of 2. The population standard deviation is known to be 0. A common hypothesis testing procedure is outlined as follows: Statement of hypothesis. Note that because this is a two-tailed test, HA allows for values above and below 2. Select the appropriate test statistic.
State the decision rule regarding the hypothesis. Because the total area of both rejection regions combined is 0. The decision rule can be stated as: Reject H0 if: z-statistic z0.
The value of x from the sample is 2. Hence, there is sufficient evidence to reject H0. Make a decision based on the results of the test.
Based on the sample information and the results of the test, it is concluded that the machine is out of adjustment and should be shut down for repair. Testing the Equality of Means LO In finance, we are often interested in testing whether the means of two populations are equal to each other. This is equivalent to testing whether the difference between the two means is zero. If we assume two series X and Y are each independent and identically distributed i. The steps to test the hypothesis that the means are equal would then follow the standard hypothesis testing procedure.
The null hypothesis would be that the difference between the two is equal to zero, versus the alternative that it is not equal to zero. Given the test size and the appropriate critical value, the null would be rejected or fail to be rejected by comparing the test statistic to the critical value.
Multiple Hypothesis Testing LO Multiple testing means testing multiple different hypothesis on the same data set.
For example, suppose we are testing 10 active trading strategies against a buy-and-hold trading strategy. The problem is that if we keep testing different strategies against the same null hypothesis, it is highly likely we are eventually going to reject one of them. The problem with this is that the alpha the probability of incorrectly rejecting a true null is only accurate for one single hypothesis test.
As we test more and more strategies, the actual alpha of this repeated testing grows larger, and as alpha grows larger, the probability of a Type I error increases. The most likely bias to result from testing multiple hypotheses on a single data set is that the value of A. The value of the calculated test statistic is closest to A. The test statistic is the value that a decision about a hypothesis will be based on.
A one-tailed test results from a one-sided alternative hypothesis e. A hypothesis about a population parameter is rejected when the sample statistic lies outside a confidence interval around the hypothesized value for the chosen level of significance. The problem with multiple testing is that the alpha is only accurate for one single hypothesis test. As we test more and more strategies, the alpha of this repeated testing grows larger, and as alpha grows larger, the probability of a Type I error increases.
B The probability of getting a test statistic outside the critical value s when the null is true is the level of significance and is the probability of a Type I error. The power of a test is one minus the probability of a Type II error. Hypothesis testing does not prove a hypothesis; we either reject the null or fail to reject it. A With multiple testing, the alpha the probability of incorrectly rejecting a true null is only accurate for one single hypothesis test.
Typically, we estimate a regression equation using ordinary least squares OLS , which minimizes the sum of squared errors in the sample data.
For the exam, be able to conduct hypothesis tests, calculate confidence intervals, and remember the assumptions underlying the regression model. Finally, understand how to interpret a regression equation. Regression analysis seeks to measure how changes in one variable, called a dependent or explained variable can be explained by changes in one or more other variables called the independent or explanatory variables. This relationship is captured by estimating a linear equation.
As an example, we want to capture the relationship between hedge fund returns and lockup periods. For this simple two-variable case i. Linear Regression Conditions To use linear regression, three conditions need to be satisfied: 1. The relationship between Y and X should be linear discussed later.
The error term must be additive i. All X variables should be observable i. The term linear has implications for both the independent variable s and the unknown parameters i. However, appropriate transformations of the independent variable s can make a nonlinear relationship amenable to be fitted using a linear model. If the relationship between the dependent variable Y and an independent variable X is nonlinear, then an analyst would do that transformation first and then enter the transformed value into the linear equation as X.
For example, in estimating a utility function as a function of consumption, we might allow for the property of diminishing marginal utility by transforming consumption into a logarithm of consumption. A second interpretation of the term linear applies to the unknown coefficients. It specifies that the dependent variable is a linear function of the coefficients. Generally, if the value of the independent variable is zero, then the expected value of the dependent variable would be equal to A.
The error term represents the portion of A. A linear regression function assumes that the relation being modeled must be linear in A. In the case where the model uses multiple independent variables, the interpretation of the slope coefficient captures the change in the dependent variable for one unit change in independent variable, holding the other independent variables constant.
As you will see in the next reading, this is why the slope coefficients in a multiple regression are sometimes called partial slope coefficients. Dummy Variables Observations for most independent variables e.
However, there are occasions when the independent variable is binary in nature´┐Żit is either on or off. Independent variables that fall into this category are called dummy variables and are often used to quantify the impact of qualitative variables.
Dummy variables are assigned a value of 0 or 1. For example, in a time series regression of monthly stock returns, you could employ a January dummy variable that would take on the value of 1 if a stock return occurred in January, and 0 if it occurred in any other month.
The purpose of including the January dummy variable would be to see if stock returns in January were significantly different than stock returns in all other months of the year.
Coefficient of Determination of a Regression R2 The R2 of a regression model captures the fit of the model; it represents the proportion of variation in the dependent variable that is explained by the independent variable s. For a regression model with a single independent variable, R2 is the square of the correlation between the independent and dependent variable. OLS regression requires a number of assumptions. This assumption is not directly testable; OLS estimates using sample data ensure that the shocks are always uncorrelated with Xs.
Evaluation of whether this assumption is reasonable requires an examination of the data generating process. Generally, a violation would be evidenced by the following: Survivorship, or sample selection, bias: Survivorship bias occurs when the observations are collected after-the-fact e. Sample selection bias occurs when occurrence of an event i.
For example, mortgage refinancing is severely curtailed during falling housing prices and, hence, the sample of actual refinancing transactions is therefore more likely to occur during a rising home-price environment. Simultaneity bias: This happens when the values of X and Y are simultaneously determined.
For example, trading volume and volatility are related; volume increases during volatile times. Omitted variables: Important explanatory i. If they are, the errors will capture the influence of the omitted variables.
Omission of important variables cause the coefficients to be biased and may indicate nonexistent i. Attenuation bias: This occurs when X variables are measured with error and leads to underestimation of the regression coefficients.
All X, Y observations are independent and identically distributed i. Variance of the errors is constant i. It is unlikely that large outliers will be observed in the data. OLS estimates are sensitive to outliers, and large outliers have the potential to create misleading regression results.
Secondly, they ensure that the estimators are normally distributed and, as a result, allowed for hypothesis testing discussed later.
Because OLS estimators are derived from random samples, these estimators are also random variables because they vary from one sample to the next.
Therefore, OLS estimators will have their own probability distributions i. These sampling distributions allow us to estimate population parameters, such as the population mean, the population regression intercept term, and the population regression slope coefficient.
Drawing multiple samples from a population will produce multiple sample means. The distribution of these sample means is referred to as the sampling distribution of the sample mean. The mean of this sampling distribution is used as an estimator of the population mean and is said to be an unbiased estimator of the population mean. Given the central limit theorem, for large sample sizes, it is reasonable to assume that the sampling distribution will approach the normal distribution.
This means that the estimator is also a consistent estimator. A consistent estimator is one for which the accuracy of the parameter estimate increases as the sample size increases. Like the sampling distribution of the sample mean, OLS estimators for the population intercept term and slope coefficient also have sampling distributions.
This makes sense because the variance of the slope indicates the reliability of the sample estimate of the coefficient, and the higher the variance of the error, the lower the reliability of the coefficient estimate. Higher variance of the explanatory X variable s indicates that there is sufficient diversity in observations i. Ordinary least squares OLS refers to the process that A. What is the most appropriate interpretation of a slope coefficient estimate equal to The predicted value of the dependent variable when the independent variable is zero is The predicted value of the independent variable when the dependent variable is zero is 0.
For every one unit change in the independent variable, the model predicts that the dependent variable will change by 10 units. For every one unit change in the independent variable, the model predicts that the dependent variable will change by 0.
The reliability of the estimate of the slope coefficient in a regression model is most likely A. The mean inflation Y over the past months is 0.
Mean unemployment during that same time period X is 0. A researcher estimates that the value of the slope coefficient in a single explanatory variable linear regression model is equal to zero. Which one of the following is most appropriate interpretation of this result? The mean of the Y variable is zero. The intercept of the regression is zero. The relation between X and Y is not linear. The coefficient of determination R2 of the model is zero.
The steps in the hypothesis testing procedure for regression coefficients are as follows: 1. Specify the hypothesis to be tested. Calculate the test statistic. Reject or fail to reject the null hypothesis after comparing the test statistic to its critical value.
We can then conduct hypothesis testing using sample value of the coefficient and its standard error. Because 2. Where tc is the critical t-value for a given level of significance and degrees of freedom n ´┐Ż 2. An alternative method of doing hypothesis testing of regression coefficients is to compare the p-value to the significance level: If the p-value is less than the significance level, the null hypothesis can be rejected.
If the p-value is greater than the significance level, the null hypothesis cannot be rejected. For the regression model involving inflation as the explanatory variable, the confidence interval for the slope coefficient is closest to A. For the regression model involving unemployment rate as the explanatory variable, what are the results of a hypothesis test that the slope coefficient is equal to 0. The coefficient is not significantly different from 0.
The coefficient is significantly different from 0 because the p-value is 0. The coefficient is significantly different from 0 because the t-value is 2. The coefficient is not significantly different from 1 because t-value is 0.
To use linear regression, the following three conditions need to be satisfied: 1. The relationship between Y and X should be linear. The variance of the error term is independent of the observed data. All X variables should be observable. The variance of the errors is constant i. Specify the hypothesis. If the p-value is less than the significance level, the null hypothesis can be rejected, otherwise we fail to reject the null.
A The error term represents effects from independent variables not included in the model. It could be explained by additional independent variables. D OLS is a process that minimizes the sum of squared residuals to produce estimates of the population parameters known as sample regression coefficients.
C The slope coefficient is best interpreted as the predicted change in the dependent variable for a one-unit change in the independent variable.
If the slope coefficient estimate is The intercept term is best interpreted as the value of the dependent variable when the independent variable is equal to zero. The p-value of 0. C The p-value provided is for hypothesized value of the slope coefficient being equal to 0. The hypothesized coefficient value is 0. C When the p-value is less than the level of significance, the slope coefficient is significantly different from 0.
For the exam, be able to evaluate and calculate goodness-of-fit measures such as R2 and adjusted R2 as well as hypothesis testing related to these concepts.
Hypothesis testing of individual slope coefficients in a multiple regression model as well as confidence intervals of those coefficients is also important testable material. We extend our regression function in this reading to include multiple explanatory variables which is most commonly used in practice.
Recall the assumptions of single regression model modified for multiple Xs : 1. All Xs and Y observations are i. There are no outliers observed in the data. An additional sixth assumption is needed for multiple regression: 6.
X variables are not perfectly correlated i. In other words, each X variable in the model should have some variation that is not fully explained by the other X variables.
For a multiple regression, the interpretation of the slope coefficient is that it captures the change in the dependent variable for a one unit change in independent variable, holding the other independent variables constant. As a result, the slope coefficients in a multiple regression are sometimes called partial slope coefficients.
The ordinary least squares OLS estimation process for multiple regression differs from single regression. In a stepwise fashion, first, the individual explanatory variables are regressed against other explanatory variables and the residuals from these models become explanatory variables in the regression using the original independent variable.
We would expect this to happen most of the time when a second variable is added to the regression, unless X2 is uncorrelated with X1, because if X1 increases by 1 unit, then we would expect X2 to change as well. The multiple regression equation captures this relationship between X1 and X2 when predicting Y. Now the interpretation of the estimated slope coefficient for X1 is that if X1 increases by 1 unit, we would expect Y to increase by 2. As usual, the intercept 1. Calculate the following: 1.
Based on the results in the table, which of the following most accurately represents the regression equation? The expected amount of the stock return attributable to it being a Fortune stock is closest to A. Which of the following is not an assumption of single regression? There are no outliers in the data. The variance of the independent variables is greater than zero.
Independent variables are not perfectly correlated. Residual variance are homoskedastic. The standard error of the regression SER measures the uncertainty about the accuracy of the predicted values of the dependent variable. Graphically, the relationship is stronger when the actual x,y data points lie closer to the regression line i.
Recall that OLS estimation minimizes the sum of the squared differences between the predicted value and actual value for each observation. This proportion is the coefficient of determination R2 of a multiple regression and is a goodness of fit measure. While it is a goodness of fit measure, R2 by itself may not be a reliable measure of the explanatory power of the multiple regression model due to three reasons. First, R2 almost always increases as independent variables are added to the model, even if the marginal contribution of the new variables is not statistically significant.
Consequently, a relatively high R2 may reflect the impact of a large set of independent variables rather than how well the set explains the dependent variable. This problem is often referred to as overestimating the regression. Adjusted R2 To overcome the problem of overestimating the impact of additional variables on the explanatory power of a regression model, many researchers recommend adjusting R2 for the number of independent variables.
So, while adding a new independent variable to the model will increase R2, it may either increase or decrease the R. If the new variable has only a small effect on R2, the value of R may decrease.
In addition, R may be less than zero if the R2 is low enough. Finally, there are no clear predefined values of R2 that indicate whether the model is good or not. For some noisy variables e. The total sum of squares for the regression is , and the residual sum of squares is Calculate the R2 and adjusted R2.
Identify which model the analyst would most likely prefer. As with single regression, the magnitude of the coefficients in a multiple regression tells us nothing about the importance of the independent variable in explaining the dependent variable. Thus, we must conduct hypothesis testing on the estimated slope coefficients to determine if the independent variables make a significant contribution to explaining the variation in the dependent variable.
The results of the regression are produced in the following table. Instead, we use the F-test. The F-test An F-test is useful to evaluate a model against other competing partial models. For example, a model with three independent variables X1, X2, and X3 can be compared against a model with only one independent variable X1. We are trying to see if the two additional variables X2 and X3 in the full model contribute meaningfully to explain the variation in Y.
If the calculated F-stat is greater than the critical F-value, the full model contributes meaningfully to explaining the variation in Y. Using a sample consisting of 54 observations, the researcher found that RSS in the model with three explanatory variables is 6, while the RSS in the single variable model is 7, Evaluate the model with extra variables relative to the standard CAPM formulation. Note that one of the two variables removed from the full model may still be insignificant, but we are only concluding here that both variables are not insignificant.
A more generic F-test is used to test the hypothesis that all variables included in the model do not contribute meaningfully in explaining the variation in Y versus at least one of the variables does contribute statistically significantly. The total sum of squares is , and the residual sum of squares is Therefore, we can reject the null hypothesis and conclude that at least one of the five independent variables is significantly different than zero. In addition, all 50 stocks in the sample come from two industries, electric utilities or biotechnology.
Variable Coefficient t-Statistic Intercept 6. Based on these results, it would be most appropriate to conclude that A. Ohlmer is valuing a biotechnology stock with a dividend payout ratio of 0. When interpreting the R2 and adjusted R2 measures for a multiple regression, which of the following statements incorrectly reflects a pitfall that could lead to invalid conclusions? The R2 measure does not provide evidence that the most or least appropriate independent variables have been selected.
If the R2 is high, we have to assume that we have found all relevant independent variables. If adding an additional independent variable to the regression improves the R2, this variable is not necessarily statistically significant. The R2 measure may be spurious, meaning that the independent variables may show a high R2; however, they are not the exact cause of the movement in the dependent variable.
So, each X variable should have some variation that is not fully explained by the other X variables. C The coefficients column contains the regression parameters. D The regression equation is 0. C This is an assumption for multiple regression and not for single regression.
Remember, however, this is only accurate if we hold the other independent variables in the model constant. B If the R2 is high, we cannot assume that we have found all relevant independent variables.
Omitted variables may still exist, which would improve the regression results further. For the exam, be able to explain the effects of heteroskedasticity and multicollinearity on a regression. Also, understand the bias-variance tradeoff and the consequences of including an irrelevant explanatory variable versus excluding a relevant explanatory variable.
If the variance of the residuals is constant across all observations in the sample, the regression is said to be homoskedastic. When the opposite is true, the regression exhibits heteroskedasticity, which occurs when the variance of the residuals is not the same across all observations in the sample.
This happens when there are subsamples that are more spread out than the rest of the sample. While this is a violation of the equal variance assumption, it usually causes no major problems with the regression. Conditional heteroskedasticity is heteroskedasticity that is related to the level of i. For example, conditional heteroskedasticity exists if the variance of the residual term increases as the value of the independent variable increases, as shown in Figure Notice in this figure that the residual variance associated with the larger values of the independent variable, X, is larger than the residual variance associated with the smaller values of X.
Conditional heteroskedasticity does create significant problems for statistical inference. The coefficient estimates i. Because of unreliable standard errors, hypothesis testing is unreliable. Detecting Heteroskedasticity As shown in Figure Formally, a chi-squared test statistic can be computed as follows: 1.
Use the squared estimated residuals in step 1 as the independent variable in a new regression with the original explanatory variables. Correcting for Heteroskedasticity If conditional heteroskedasticity is detected, we can conclude that the coefficients are unaffected but the standard errors are unreliable.
In such a case, revised, White standard errors should be used in hypothesis testing instead of the standard errors from OLS estimation procedures. The introduction of these robust standard errors is credited to Halbert White, a well-known professor in econometrics. Recall from the previous reading the additional assumption needed in multiple regression as opposed to a single regression: X variables are not perfectly correlated i.
When the X variables are perfectly correlated, it is called as perfect collinearity. This would be the case when one of the independent variables can be perfectly characterized by a linear combination of other independent variables e. Multicollinearity refers to the condition when two or more of the independent variables, or linear combinations of the independent variables, in a multiple regression are highly correlated with each other.
While multicollinearity does not represent a violation of regression assumptions, its existence compromises the reliability of parameter estimates. Effect of Multicollinearity On Regression Analysis As a result of multicollinearity, there is a greater probability that we will incorrectly conclude that a variable is not statistically significant e.
Multicollinearity is likely to be present to some extent in most economic models. The issue is whether the multicollinearity has a significant effect on the regression results.
Detecting Multicollinearity The most common way to detect multicollinearity is the situation where ttests indicate that none of the individual coefficients is significantly different than zero, while the R2 is high and the F-test rejects the null hypothesis.
This suggests that the variables together explain much of the variation in the dependent variable, but the individual independent variables do not. Answer: The R2 is high, which suggests that the three variables as a group do an excellent job of explaining the variation in mutual fund returns. This is a classic indication of multicollinearity.
Another approach to identify multicollinearity is to calculate the variance inflation factor VIF for each explanatory variable. Correcting Multicollinearity The most common method to correct for multicollinearity is to omit one or more of the correlated independent variables. Unfortunately, it is not always an easy task to identify the variable s that are the source of the multicollinearity. There are statistical procedures that may help in this effort, like stepwise regression, which systematically remove variables from the regression until multicollinearity is minimized.
Effects of conditional heteroskedasticity include which of the following problems? The coefficient estimates in the regression model are biased. The standard errors are unreliable. I only B. II only C. Both I and II D. Neither I nor II 2. Hsu determines that the chisquared statistics calculated using the R2 of the regression involving the squared residuals as dependent variable exceeds the chi-squared critical value. Which of the following is the most appropriate conclusion for Hsu to reach?
Hsu should estimate the White standard errors for use in hypothesis testing. OLS estimates and standard errors are consistent, unbiased, and reliable. OLS coefficients are biased but standard errors are reliable. A linear model is inappropriate to model the variation in the dependent variable.
Ben Strong recently joined Equity Partners as a junior analyst. Within a few weeks, Strong successfully modeled the movement of price for a hot stock using a multiple regression model. Variables X1 and X2 are highly correlated and should be combined into one variable. Variable X3 should be dropped from the model. Variable X2 should be dropped from the model. Variables X1 and X2 are not statistically significant. Which of the following statements regarding multicollinearity is least accurate? Multicollinearity may be present in any regression model.
Multicollinearity is not a violation of a regression assumption. Multicollinearity makes it difficult to determine the contribution to explanation of the dependent variable of an individual explanatory variable. If the t-statistics for the individual independent variables are insignificant, yet the F-statistic is significant, this indicates the presence of multicollinearity. Model specification is an art requiring a thorough understanding of the underlying economic theory that explains the behavior of the dependent variable.
For example, many factors may influence short-term interest rates, including inflation rate, unemployment rate, GDP growth rate, capacity utilization, and so forth. Omitting relevant factors from an ordinary least squares OLS regression can produce misleading or biased results.
Omitted variable bias is present when two conditions are met: 1 the omitted variable is correlated with other independent variables in the model, and 2 the omitted variable is a determinant of the dependent variable. When relevant variables are absent from a linear regression model, the results will likely lead to incorrect conclusions, as the OLS estimators may not accurately portray the actual data.
The coefficients of the included variables that are correlated with the omitted variable will partly depending on the correlation between them pick up the impact of the omitted variable leading to biased estimates of coefficients of those variables. The issue of omitted variable bias occurs regardless of the size of the sample and will make OLS estimators inconsistent. The correlation between the omitted variable and the included independent variables will determine the size of the bias i.
The coefficients of the included independent variables therefore would be biased and inconsistent. Bias-Variance Tradeoff LO The holy-grail of model specification is selecting the appropriate explanatory variables to include in the model.
Models with too many explanatory variables i. Overfit, larger models have a high bias error due to inclusion of too many independent variables.
Smaller models, on the other hand, have high insample variance errors i. There are two ways to deal with this bias-variance error tradeoff: 1. General-to-specific model: involves starting with the largest model and then successively dropping independent variables that have the smallest absolute t-statistic. A set of candidate models are first determined and then tested using this procedure to find the optimal model´┐Żone which has the lowest out-of-sample error.
Residual Plots LO Ideally, the residuals should be small in magnitude, and not related to any of the explanatory variables.
Alternatively, standardized residuals i. Identifying Outliers LO Recall that one of the assumptions of linear regression is that there are no outliers in the sample data. This is because the presence of outliers skews the estimated regression parameters. Outliers, when removed, induce large changes in the value of the estimated coefficients.
Specifically, the relationship between Y and X s should be linear and residuals should be homoskedastic i. If there are no outliers, and the residuals have an expected value of zero, we can relax the assumption of normality for the residual distribution. The omitted variable bias results from A. Which of the following statements about bias-variance tradeoff is least accurate? Models with a large number of independent variables tend to have a high bias error. High variance error results when the R2 of a regression is high.
Models with fewer independent variables tend to have a high variance error. General-to-specific model is one approach to resolve the bias-variance tradeoff. Evaluate the following statements: I. Both statements are correct. Only statement I is correct. Only statement II is correct.
Both statements are incorrect. Even though the coefficient estimates are unbiased and consistent, the estimated standard errors are unreliable in the presence of conditional heteroskedasticity. The results of any hypothesis testing are therefore unreliable. In such a case, revised, White estimated standard errors should be used in hypothesis testing instead of the standard errors from OLS procedures.
Multicollinearity refers to when two or more of the independent variables, or linear combinations of the independent variables, in a multiple regression are highly correlated with each other. As a result of multicollinearity, there is a greater probability that we will incorrectly conclude that a variable is not statistically significant e.
One of the clues for presence of multicollinearity is when there is a disconnect between t-tests for significance of individual slope coefficients and the F-test for the overall model. B Effects of heteroskedasticity include the following: 1 The standard errors are usually unreliable estimates and 2 the coefficient estimates are not affected. In such a case, the OLS estimates of standard errors would be unreliable and Hsu should estimate White corrected standard errors for use in hypothesis testing.
Coefficient estimates would still be reliable i. One of the approaches to overcoming the problem of multicollinearity is to drop the highly correlated variable. Note: Once hard books are purchased there is no guarantee they can be canceled. Updated each calendar year to ensure they remain current and reflect any Part II curriculum changes, these books cover all the readings and themes that comprise the Part II Exam:.
The eBooks cover all the readings and themes that comprise the Part I Exam:. Note: FRM eBooks are non-refundable. Once you purchase, please follow the instructions to access them. Access is for three years online via web browser and two years offline via desktop or mobile applications. Limited printing is supported. In addition to the information contained in the books, the FRM Exam covers a selection of material from leading academics and practitioners.
These online readings are a required part of the FRM curriculum. These additional readings contain the full texts of some Basel regulations covered in the FRM curriculum. While not required, these readings provide additional insights into the context and mechanics of the Basel regulations and are therefore highly recommended. Please note: When you click on the Download button, you will be prompted with a zip file.
Both the full and abbreviated versions will be provided. EPPs may offer courses online or in person. If you believe you have identified an error or discrepancy in the curriculum, please contact us. We pursue all errata submissions and post updated errata including corresponding corrections on this site. We are a not-for-profit organization and the leading globally recognized membership association for risk managers.
FRM Study Materials. Study Documents.