# Proxy variable interactions and bias I’ve recently taking on the daunting task of reading Wooldridge (2010) cover-to-cover. If I’m lucky I’ll remember 10-15% of it but hey, at least I tried. Anyhow, I read a passage a couple a days ago which cached my interest and since I want to make sense of it I started out by running Monte Carlo simulations of it. I also got kind of inspired by a new friend of mine Rafael Ahlskog’s blog  so thought I’d dust off the old blogging skills and start writing about some metrics for the few nerds out there. So here’s the story. Consider the structural model, $\displaystyle y= \beta_1 x_1 + \beta_2 x_2 + \hdots + \beta_K x_K + \gamma_1 q + \gamma_2 x_Kq$

where $\displaystyle q$ is some unobserved variable who happens to vary with $\displaystyle x_K$. For some reason, we don’t observe $\displaystyle q$ but have a proxy variable for it denoted by $\displaystyle z$ which fulfills the (i) redundancy condition $\displaystyle \mathbb{E}[y \mid x_1, x_2, \hdots, x_K, q, z] = \mathbb{E}[y \mid x_1, x_2, \hdots, x_K, q]$ and (ii) $\displaystyle Cov(q,x_j)=0 \; \forall j=1,2...K$ when we control for z. Let’s assume that $\displaystyle q$ can be written as a linear function of $\displaystyle z$ such that, $\displaystyle q= \theta_0 + \theta_1 z + r$

If $\displaystyle x_k$ and $\displaystyle q$ weren’t interacted we’d be all happy and we could consistently estimate $\displaystyle \beta_j$. However, now when $\displaystyle x_k$ and $\displaystyle q$ are interacted the coefficient on $\displaystyle x_K$ becomes biased. To see that, lets simplify the structural model and consider only the three variable case where $\displaystyle x_1=1$ (intercept) and $\displaystyle x_k$. We are interested in the condtional expectation of $\displaystyle y$ given $\displaystyle x_1 , x_K$ and $\displaystyle q$. Using our proxy variable we get, \begin{aligned} \mathbb{E}[y \mid \mathbf{x}, q] &= \beta_1 + \beta_K x_K + \gamma_1 q + \gamma_2 x_K q \\ &= \beta_1 + \beta_K x_K + \gamma_1 (\theta_0 + \theta_1 z + r) + \gamma_2 x_K (\theta_0 + \theta_1 z + r) \\ &= \beta_1 + \gamma_1 \theta_0 + x_K( \beta_K + \gamma_2 \theta_0) + \gamma_1 \theta_1 z + \gamma_2 \theta_1 x_K z \\ \end{aligned}

where we have used the fact that $\mathbb{E}[zr]=0$ which follows by definition of the linear projection and $\mathbb{E}[x_Kr]$ which follows from our second assumption about the proxy variable. We see that the coefficient on $x_K$ is not merely its structural parameter $\beta_K$ but it is biased by $\gamma_2 \theta_0$. In other words $\displaystyle plim \; \hat{\beta}_K \rightarrow \beta_K + \gamma_2 \theta_0$. So when the variable of interest $\displaystyle x_k$  is interacted with the unobserved variable (i.e. $\displaystyle \gamma_2 \neq 0$) we cannot consistently estimate $\displaystyle \beta_K$ unless we assume that $\displaystyle$ $\displaystyle \theta_0=0$ which I take as being the same as assuming that $\displaystyle \mathbb{E}[q]=0$ given that the mean value of the porxy is zero?  (Wooldridge has a typo here, or as he refered to it in my email to him, “a thinko” in his book as e claims what is needed is $\displaystyle \mathbb{E}[z]=0$). This i kind of amazing! How seldom isn’t it that we credibility can assume that the mean of our unobserved variable is equal to zero? In the classic example of ability for instance. Wooldrige suggests to demean the proxy variable as a solution to this problem but that does not change the bias if $\displaystyle \theta_0 \neq 0$ which becomes obvious in the simulation I ran. I’ll post these in an upcoming segment.

So the take home from all of this is that; if you believe that your variable of interest is interacted with an unobserved variable, using a proxy for that variable won’t enable you to consistently estimate the parameter of interest unless you assume that the expected value of the unobserved variable is zero. Or is it sufficient that $\displaystyle \mathbb{E}[q]=\mathbb{E}[z]$?

PS. These blog posts are far from scientific in the sense that they lack peer review when published. I’d be happy to her from anyone who finds errors or simply disagrees with the conclusions. DS.