Two Stage Least Squares

Two Stage Least Squares (TSLS) is one of the simplest mathematical topics ever known. It is very common situation when elementary things are explained in the most complicated way possible. However, the way how TSLS is explained is out of the ordinary. To make everything nice and clear it is only needed to use strict notations of linear algebra. Here we try to follow the classical way of explanation of the linear algebra problems.

Let us consider simple linear regression, where current value y_i depends on observations x_i

y₂ = a₀ * x₂ + a₁ * x₁ + e₁
y₃ = a₀ * x₃ + a₁ * x₂ + e₂
y₄ = a₀ * x₄ + a₁ * x₃ + e₃
...
y_n = a₀ * x_n + a₁ * x_n-1 + e_n-1

where e_k is an error. The errors do not allow to accurately estimate parameters a₀ and a₁ by the ordinary least squares (OLS). The problem is addressed by bringing up new information about observations x_i, provided that it is available or can be obtained

x₁ = b₀ * z₁ + b₁ * z₀ + err₁
x₂ = b₀ * z₂ + b₁ * z₁ + err₂
x₃ = b₀ * z₃ + b₁ * z₂ + err₃
...
x_n = b₀ * z_n + b₁ * z_n-1 + err_n

where z_k is new observable values and err_k are errors that are not as bad as e_k and can be filtered by applying OLS.
If we express first system in matrix notations, we get the following

y = X a + e

We use bold font for vectors and matrices, lower case letters for vectors and upper case letters for matrices. It can be further seen that following standard notations makes explanation simple and clear.
Now we rewrite this upper shown new additional information by simple rearranging its elements and assembling them into a new matrix equation, here I provide it with scalar notations first

z₀ z₁ z₂
z₁ z₂ z₃
z₂ z₃ z₄
...
z_n-2 z_n-1 z_n

0 b₁
b₁ b₀
b₀ 0

x₂ x₁
x₃ x₂
x₄ x₃
...
x_n x_n-1

The errors err_k are dropped, but we assume that all matrix elements are known approximately. Rewriting this equation in matrix notation gives the following

Z B = X

Now we can clearly see that in both matrix equations y = X a + e and Z B = X matrix X is identical, but matrix B and vector a are known, of which only a represents interest in research and errors e do not allow to apply OLS. Here we apply elementary trick to resolve this simple algebraic problem. We find matrix B by applying OLS.

B = (Z^T Z)^-1 Z^T X

Now we put newly found B into previous matrix equation, which will make sense further in the explanation.

Z (Z^T Z)^-1 Z^T X = X

At this point we replace X in original equation by the left-hand-size in above expression, the result is the following

y = Z (Z^T Z)^-1 Z^T X a + e

And now we multiply both parts by Z^T.

Z^T y = Z^T X a + Z^T e

Since matrix Z has three columns, vector Z^T y has three elements, matrix Z^T X has size 3 by 2. If columns of matrix Z are correlated with columns of X and vector y and not correlated with error e than the elements of matrix Z^T X and elements of vector Z^Ty must be significantly larger elements of Z^T e , which provides the effective filtering of errors e, and that is the goal of the method. The latter system can be solved by OLS. It is critical to pay attention to indexes in data arrays when building the matrices of the system. While columns of Z can be rearranged the indexes of both X and Z must be chosen in a coordinated way.

The last part of explanation of TSLS needs to introduce widely used terminology. I believe that terminology must be explained at the end of an article, when the method is already understood by the reader and not in the first paragraph, like other technical writers do. Elements of matrix Z are called instrumental variables or instruments. They are always denoted as Z to make it distinct. Elements of vectors x and y are called endogenous variables. The name has come from economy. It is used for those parameters that influence each other and neither of them is dependent nor independent. For example, demand and price on the market. Demand influence the price and price affects the demand. This is the property attributed to original system, the error e can't be filtered by OLS because x and y are endogenous. They influence each other or are related in such a way that original system has these specific errors making application of OLS not effective. Opposite to endogenous variable is exogenous variable.

Feb, 2019.