I’ve recently taking on the daunting task of reading Wooldridge (2010) cover-to-cover. If I’m lucky I’ll remember 10-15% of it but hey, at least I tried. Anyhow, I read a passage a couple a days ago which cached my interest and since I want to make sense of it I started out by running Monte Carlo simulations of it. I also got kind of inspired by a new friend of mine Rafael Ahlskog’s blog so thought I’d dust off the old blogging skills and start writing about some metrics for the few nerds out there. So here’s the story. Consider the structural model,

where is some unobserved variable who happens to vary with . For some reason, we don’t observe but have a proxy variable for it denoted by which fulfills the (i) **redundancy condition** and (ii) when we control for z. Let’s assume that can be written as a linear function of such that,

If and weren’t interacted we’d be all happy and we could consistently estimate . However, now when and are interacted the coefficient on becomes biased. To see that, lets simplify the structural model and consider only the three variable case where (intercept) and . We are interested in the condtional expectation of given and . Using our proxy variable we get,

where we have used the fact that which follows by definition of the linear projection and which follows from our second assumption about the proxy variable. We see that the coefficient on is not merely its structural parameter but it is biased by . In other words . So when the variable of interest is interacted with the unobserved variable (i.e. ) we cannot consistently estimate unless we assume that which I take as being the same as assuming that given that the mean value of the porxy is zero? (Wooldridge has a typo here, or as he refered to it in my email to him, “a thinko” in his book as e claims what is needed is ). This i kind of amazing! How seldom isn’t it that we credibility can assume that the mean of our unobserved variable is equal to zero? In the classic example of ability for instance. Wooldrige suggests to demean the proxy variable as a solution to this problem but that does not change the bias if which becomes obvious in the simulation I ran. I’ll post these in an upcoming segment.

So the take home from all of this is that; if you believe that your variable of interest is interacted with an unobserved variable, using a proxy for that variable won’t enable you to consistently estimate the parameter of interest unless you assume that the expected value of the unobserved variable is zero. Or is it sufficient that ?

PS. These blog posts are far from scientific in the sense that they lack peer review when published. I’d be happy to her from anyone who finds errors or simply disagrees with the conclusions. DS.