Two Stage Least SquaresTwo Stage Least Squares (TSLS) is one of the simplest mathematical topics ever known. It is very common situation when elementary things are explained in the most complicated way possible. However, the way how TSLS is explained is out of the ordinary. To make everything nice and clear it is only needed to use strict notations of linear algebra. Here we try to follow the classical way of explanation of the linear algebra problems.Let us consider simple linear regression, where current value y_{i} depends on observations x_{i} y_{2} = a_{0} * x_{2} + a_{1} * x_{1} + e_{1} y_{3} = a_{0} * x_{3} + a_{1} * x_{2} + e_{2} y_{4} = a_{0} * x_{4} + a_{1} * x_{3} + e_{3} ... y_{n} = a_{0} * x_{n} + a_{1} * x_{n-1} + e_{n-1} where e_{k} is an error. The errors do not allow to accurately estimate parameters a_{0} and a_{1} by the ordinary least squares (OLS). The problem is addressed by bringing up new information about observations x_{i}, provided that it is available or can be obtained x_{1} = b_{0} * z_{1} + b_{1} * z_{0} + err_{1} x_{2} = b_{0} * z_{2} + b_{1} * z_{1} + err_{2} x_{3} = b_{0} * z_{3} + b_{1} * z_{2} + err_{3} ... x_{n} = b_{0} * z_{n} + b_{1} * z_{n-1} + err_{n} where z_{k} is new observable values and err_{k} are errors that are not as bad as e_{k} and can be filtered by applying OLS. If we express first system in matrix notations, we get the following y = X a + e We use bold font for vectors and matrices, lower case letters for vectors and upper case letters for matrices. It can be further seen that following standard notations makes explanation simple and clear. Now we rewrite this upper shown new additional information by simple rearranging its elements and assembling them into a new matrix equation, here I provide it with scalar notations first
The errors err_{k} are dropped, but we assume that all matrix elements are known approximately. Rewriting this equation in matrix notation gives the following Z B = X Now we can clearly see that in both matrix equations y = X a + e and Z B = X matrix X is identical, but matrix B and vector a are known, of which only a represents interest in research and errors e do not allow to apply OLS. Here we apply elementary trick to resolve this simple algebraic problem. We find matrix B by applying OLS. B = (Z^{T} Z)^{-1} Z^{T} X Now we put newly found B into previous matrix equation, which will make sense further in the explanation. Z (Z^{T} Z)^{-1} Z^{T} X = X At this point we replace X in original equation by the left-hand-size in above expression, the result is the following y = Z (Z^{T} Z)^{-1} Z^{T} X a + e And now we multiply both parts by Z^{T}. Z^{T} y = Z^{T} X a + Z^{T} e Since matrix Z has three columns, vector Z^{T} y has three elements, matrix Z^{T} X has size 3 by 2. If columns of matrix Z are correlated with columns of X and vector y and not correlated with error e than the elements of matrix Z^{T} X and elements of vector Z^{T}y must be significantly larger elements of Z^{T} e , which provides the effective filtering of errors e, and that is the goal of the method. The latter system can be solved by OLS. It is critical to pay attention to indexes in data arrays when building the matrices of the system. While columns of Z can be rearranged the indexes of both X and Z must be chosen in a coordinated way. The last part of explanation of TSLS needs to introduce widely used terminology. I believe that terminology must be explained at the end of an article, when the method is already understood by the reader and not in the first paragraph, like other technical writers do. Elements of matrix Z are called instrumental variables or instruments. They are always denoted as Z to make it distinct. Elements of vectors x and y are called endogenous variables. The name has come from economy. It is used for those parameters that influence each other and neither of them is dependent nor independent. For example, demand and price on the market. Demand influence the price and price affects the demand. This is the property attributed to original system, the error e can't be filtered by OLS because x and y are endogenous. They influence each other or are related in such a way that original system has these specific errors making application of OLS not effective. Opposite to endogenous variable is exogenous variable. |