Saturday, April 13, 2013


Missing Data:
When people don’t answer survey questions

Elsewhere I’ve called missing data “the silent killer of valid inference.”  It is a silent killer, and a problem that is easy to overlook, because, well . . . it’s missing, not there.

Missing data is a problem all forms of research—from interviews to experiments—but it is an especially tricky and often overlooked issue in survey research.  Kathy Godfrey recently posted (in an e-mail discussion) a very clear and concise description of the problem and how to handle it. Her comments are so helpful that, with her permission, I’m re-posting them here.

 What you need to do when writing, conducting, and analyzing a survey is make sure to get the maximum amount of information from the people-- probably a large number of people--who do not respond the way you had hoped.  Kathy’s “four flavors” of non-response are actually four important variables.  A researcher can learn a lot by coding and analyzing them. 

Original Message:
Sent: 04-10-2013
From: Katherine Godfrey
Subject: Multiple imputation of "I don't know/ I don't remember"

Depending on the situation, there are at least four "flavors" of non-response, any one of which might be what's behind someone failing to answer a question (and this ignores the people who meant to respond, but simply goofed):

1. No Answer (deliberately not answering, and telling you so)
2. Does Not Apply (the question does not apply to respondent, and thus can't be answered)
3. Don't Know/Don't Remember (would answer if could, but can't)
4. No Preference/Don't Care (this is actually a real answer)

Here's an example to hopefully clarify:
Imagine a pollster asking people on the street, "Who will you vote for in the Senatorial election next week, Smith or Jones?"  The following answers are all possible:

1. "None of your business!" (No Answer)
2. "I don't live in this state." (Does Not Apply)
3. "I haven't decided yet; I'm still sorting through the candidates' stands on the issues." (Don't Know)
4. "They're both the same; I may just toss a coin in the voting booth--or stay home." (No Preference)
Not to mention the people that just walk past the pollster, leaving him to wonder if they're deliberately ignoring him or simply didn't hear him.

The second category (Does Not Apply) can turn up as "structural zeroes" in frequency analysis contexts. The difference between "Don't Know" and "No Preference" is subtle, but I think it's real.  The former says that there is (or was, or will be) an answer, but the respondent can't give it now.  The latter says that there is a known answer, and the answer is not to have a particular feeling/opinion.

The "don't know/don't remember" answer is actually more informative than a missing (non-response) answer, since a non-response could be in any of these non-response categories.  If possible, and if the sample size supports it, it could be used as a third category beyond a yes/no binary.  (For example, perhaps people who say they don't remember ever driving drunk are definitely more likely to have had an auto accident in the last year than those who say "no," but also substantially less likely to have done so than those who say "yes.")

I first learned about these types of non-response from my mother, who worked in survey research.  She was adamant that her surveys should include Don't Know, and DNA (does not apply) options for questions along with NA (no answer), to remove at least some of the reasons for people to feel that they had to leave a question blank because they couldn't or wouldn't answer.  Partial information is better than the dreaded DNR (Did Not Reply), which tells you nothing.

By the same token, I cringe whenever I see computer-administered Likert-scale survey questions that not only have no option for not answering (or indicating non-applicability), but have an even number of response options, thus not even allowing for "no preference."

1 comment:

  1. The qualitative and quantitative data analysis can be done easily by applying few tips. This blog is very helpful for me at least.