Two Stage Least Squares

Two Stage Least Squares (TSLS) is one of the simplest mathematical topics ever known. It is very common situation when elementary things are explained in the most complicated way possible. However, the way how TSLS is explained is out of the ordinary. To make everything nice and clear it is only needed to use strict notations of linear algebra. Here we try to follow the classical way of explanation of the linear algebra problems.

Let us consider simple linear regression, where current value yi depends on observations xi

y2 = a0 * x2 + a1 * x1 + e1
y3 = a0 * x3 + a1 * x2 + e2
y4 = a0 * x4 + a1 * x3 + e3
...
yn = a0 * xn + a1 * xn-1 + en-1

where ek is an error. The errors do not allow to accurately estimate parameters a0 and a1 by the ordinary least squares (OLS). The problem is addressed by bringing up new information about observations xi, provided that it is available or can be obtained

x1 = b0 * z1 + b1 * z0 + err1
x2 = b0 * z2 + b1 * z1 + err2
x3 = b0 * z3 + b1 * z2 + err3
...
xn = b0 * zn + b1 * zn-1 + errn

where zk is new observable values and errk are errors that are not as bad as ek and can be filtered by applying OLS.
If we express first system in matrix notations, we get the following

y = X a + e

We use bold font for vectors and matrices, lower case letters for vectors and upper case letters for matrices. It can be further seen that following standard notations makes explanation simple and clear.
Now we rewrite this upper shown new additional information by simple rearranging its elements and assembling them into a new matrix equation, here I provide it with scalar notations first

z0 z1 z2
z1 z2 z3
z2 z3 z4
...
zn-2 zn-1 zn
* 0  b1
b1 b0
b0 0
= x2 x1
x3 x2
x4 x3
...
xn xn-1


The errors errk are dropped, but we assume that all matrix elements are known approximately. Rewriting this equation in matrix notation gives the following

Z B = X

Now we can clearly see that in both matrix equations y = X a + e and Z B = X matrix X is identical, but matrix B and vector a are known, of which only a represents interest in research and errors e do not allow to apply OLS. Here we apply elementary trick to resolve this simple algebraic problem. We find matrix B by applying OLS.

B = (ZT Z)-1 ZT X

Now we put newly found B into previous matrix equation, which will make sense further in the explanation.

Z (ZT Z)-1 ZT X = X

At this point we replace X in original equation by the left-hand-size in above expression, the result is the following

y = Z (ZT Z)-1 ZT X a + e

And now we multiply both parts by ZT.

ZT y = ZT X a + ZT e

Since matrix Z has three columns, vector ZT y has three elements, matrix ZT X has size 3 by 2. If columns of matrix Z are correlated with columns of X and vector y and not correlated with error e than the elements of matrix ZT X and elements of vector ZTy must be significantly larger elements of ZT e , which provides the effective filtering of errors e, and that is the goal of the method. The latter system can be solved by OLS. It is critical to pay attention to indexes in data arrays when building the matrices of the system. While columns of Z can be rearranged the indexes of both X and Z must be chosen in a coordinated way.

The last part of explanation of TSLS needs to introduce widely used terminology. I believe that terminology must be explained at the end of an article, when the method is already understood by the reader and not in the first paragraph, like other technical writers do. Elements of matrix Z are called instrumental variables or instruments. They are always denoted as Z to make it distinct. Elements of vectors x and y are called endogenous variables. The name has come from economy. It is used for those parameters that influence each other and neither of them is dependent nor independent. For example, demand and price on the market. Demand influence the price and price affects the demand. This is the property attributed to original system, the error e can't be filtered by OLS because x and y are endogenous. They influence each other or are related in such a way that original system has these specific errors making application of OLS not effective. Opposite to endogenous variable is exogenous variable.


Feb, 2019.