2. The Simple Regression Model

2.1 Definition of the simple regression model

Much of applied econometric analysis begins with the following premise: y and x are two variables, representing some population. We are interested in "explaining y in terms of x," or in "studying how y varies with changes in x."

In writing down a model that will "explain y in terms of x," we must confront three issues. First, since there is never an exact relationship between two variables, how do we allow for other factors to affect y? Second, what is the functional relationship between y and x? And third, how can we be sure we are capturing a ceteris paribus relationships between y and x (if that is a desired goal)?

We can resolve these ambiguities by writing down an equation relating y to x.

A simple equation is y = beta_0 + beta_1*x + error term. (2.1)

Equation (2.1), which is assumed to hold in the population of interest, defined the simple linear regression model. It is also called the two-variable linear regression model or bivariate linear regression model because it relates the two variables x and y.

When related by (2.1), the variables y and x have several different names used interchangeably, as follows. y is called the dependent variable, the explained variable, the response variable, the predicted variable, or the regressand. x is called the independent variable, the explanatory variable, the control variable, the predictor variable, or the regressor.

The variable u, called the error term or disturbance in the relationship, represents factors other than x that affect y. A simple regression analysis effectively treats all factors affecting y other than x as being unobserved. You can usefully think of u as standing for "unobserved."

Equation (2.1) also addresses the issue of the functional relationship between y and x. If the other factors in u are held fixed, so that the change in u is zero, delta u = 0, then x has a linear effect on y:

delta y = beta_1 * delta x if delta u = 0

Thus, the change in y is simply beta_1 multiplied by the change in x. This means that beta_1 is the slope parameter in the relationship between y and x, holding the other factors in u fixed; it is of primary interest in applied economics. The intercept parameter beta_0 also has its uses, although it is rarely central to an analysis.

Example 2.2. (a simple wage equation) A model relating a person's wage to observed education and other unobserved factors is : wage = beta_0 + beta_1 * educ + error term. If wage is measured in dollars per hour and edu is years of education, then beta_! measures the changed in hourly wage given another year of education, holding all other factors fixed. Some of those factors include labor force experience, innate ability, tenure with current employer, work ethic, and innumerable other things.

The linearity of (2.1) implies that a one-unit change in x has the same effect on y, regardless of the initial value of x. This is unrealistic for many economic applications. For example, in the wage-education example, we might want to allow for increasing returns: the next year of education has a larger effect on wages than did the previous year. We will see how to allow for such possibilities in Section 2.4.

The most difficult issue to address is whether model (2.1) really allows us to draw ceteris paribus conclusions about how x affects y. We just saw in equation (2.2) that beta_1 does measure the effect of x on y, holding all other factors (in u) fixed. Is this the end of the causality issue? Unfortunately, no. How can we hope to learn in general about the ceteris paribus effect of x on y, holding other factors fixed, when we are ignoring all those other factors?

Section 2.5 will show that we are only able to get reliable estimators of beta_0 and beta_1 from a random sample of data when we make an assumption restricting how the unobservable u is related to the explanatory variable x. Without such a restriction, we will not be able to estimate the ceteris paribus effect, beta_1. Because u and x are random variables, we need a concept grounded in probability.

Before we state the key assumption about how x and u are related, we can always make one assumption about u. As long as the intercept beta_0 is included in the equation, nothing is lost by assuming that the average value of u in the population is zero.

Mathematically, E(u) = 0 (2.5)

Assumption (2.5) says nothing about the relationship between u and x, but simply makes a statement about the distribution of the unobservables in the population. We now turn to the crucial assumption regarding how u and x are related. A natural measure of the association between two random variables is the correlation coefficient.

If u and x are uncorrelated, then, as random variables, they are not linearly related. Assuming that u and x are uncorrelated goes a long way toward defining the sense in which u and x should be unrelated in equation : y = beta_0 + beta_1*x + error term (2.1)

(...) A better assumption involves the expected value of u given x.

Because u and x are random variables, we can define the conditional distribution of u given any value of x. 

* a random variable, usually written X, is a variable whose possible values are numerical outcomes of a random phenomenon.

* a conditional distribution: the concept of conditional distribution of a random variable combines the concept of distribution of a random variable and the concept of conditional probability.

* a conditional probability is a probability with some condition imposed. 

Because u and x are random variables, we can define the conditional distribution of u given any value of x. In particular, for any x, we can obtain the expected (or average) value of u for that slice of the population described by the value of x. The crucial assumption is that the average value of u deos not depend on the value of x. We can write this as:

E( u | x ) = E( u ) = 0 (2.6)

where the second equality follows from E(u) = 0 (2.5) The first equality in equation (2.6) is the new assumption. It says that, for any given value of x, the average of the unobservables is the same and therefore must equal the average value of u in the population. When we combine the first equality in equation (2.6)  with assumption (2.5), we obtain the zero conditional mean assumption.

When E( u | x ) = E( u ) = 0 is true, it is useful to break y into two components. The piece beta_0 + beta_1 * x is sometimes called the systematic part of y - that is, the part of y explained by x - and u is called the unsystematic part, or the part of y not explained by x. 

2.2 Deriving the ordinary least squares estimates

We will address the important issue of how to estimate the parameters beta_0 and beta_1 in equation. To do this, we need a sample from the population. Let {(x_i, y_i): i = 1, 2, 3, ... , n} denote a random sample of size n from the population. 

* random sample : randomly chosen sample
Random sampling is one of the simplest forms of collecting data from the total population. Under random sampling, each member of the subset carries an equal opportunity of being chosen as a part of the sampling process.

* covariance is a quantitative measure of the extent to which the deviation of one variable from its mean matches the deviation of the other from its mean. Covariance is a measure of how much two random variables vary together. A large covariance can mean a strong relationship between variables. However, you can't compare variances over data sets with different scales.
It is a mathematical relationship that is defined as: Cov (X, Y) = E[(X-E[X])(Y-E[Y])]
If two random variables are independent, their covariance is 0. 

* correlation: the normalized version of the covariance. Correlation coefficient tells us its magnitude and the strength of the linear relation.

(p. 27) to be continued 

Comments