I’m trying to make sense of all the missing data. It seems very strange to have such a high non response rate (nearly 20%) to simple demographic questions such as gender and student status, and this suggests a problem with the data.
You say here that a ‘survey response’ was generated each time somebody opened the survey, even if they answered no questions. Does that mean there wasn’t a ‘complete and submit’ step? Was every partially completed survey considered a separate ‘person’? If so, was there any way to determine if individuals were opening multiple times?
If each (however complete) opening of the survey was considered an entry (ie whatever data was entered has been counted as a person), that would suggest that individuals making several attempts to complete the survey are being multiply counted. That would be supported if the non response rates are generally higher later in the survey, which I can’t tell from this report.
If multiple attempts can’t be excluded, these numbers are unlikely to be valid. Missing data is a difficult problem, but my first thought is that a safer approach would be to only include complete responses
When ACE and HRC talked to statisticians and survey researchers as part of developing our Survey Guidelines for animal charities beginning to evaluate their own work, they consistently said that demographic questions should go at the end of the survey because they have high non-response rates and some people don’t proceed past questions they aren’t answering. So while it’s intuitively surprising that people don’t answer these simple questions, it’s not obviously different from what (at least some) experts would expect. I don’t know, however, whether 20% is an especially high non-response rate even taking that into account.
That’s interesting to know, thank you for sharing it!
Looking at this study (comparing mail and web surveys) they cited non response rates to demographic items at 2-5%. However I don’t know how similar the target population here is to the ‘general population’ in these behaviours. http://surveypractice.org/index.php/SurveyPractice/article/view/47/html
Yes, these questions were right at the end. You can see the order of the questions in the spreadsheet that Peter linked to—they correspond to the order of the columns.
Thanks Tom.
I’m limited in my spreadsheet wrangling at the moment I’m afraid, but looking at non response rates that are cited in the document above, and comparing to the order or questions, non responses seem to be low (30-50) until the question on income and donation specifics, after which they are much higher (150-220). A question that requires financial specifics seems likely to require someone to stop and seek documents, so could well cause someone to abandon the survey at least temporarily.
If somebody abandoned the survey at that point, would the information they had entered so far be submitted? Or would they have to get to the end and cluck submit for any of their data to be included?
That’s a good point, could well have happened, and is something we should consider changing.
The questions were split into a few pages, and people’s answers got saved when they clicked the ‘Continue’ button at the bottom of each page—so if they only submitted 2 pages, only those pages would be saved. We searched for retakes and saw a small number which we deleted.
I looked for identical names or email addresses, and then manually checked them. The other thing we could do would be to record people’s IP addresses and look for identical ones. However I chose not to record them due to privacy concerns. I would promise not to use them to try guessing who people are, and this identifying data never gets shared with anyone but me—I’d appreciate feedback on whether people would be comfortable with IP addresses getting recorded given this.
Tom was the one who created the survey taking architecture, so I’ve asked him to get back to you, just to make sure I don’t give you incorrect information.
I’m trying to make sense of all the missing data. It seems very strange to have such a high non response rate (nearly 20%) to simple demographic questions such as gender and student status, and this suggests a problem with the data.
You say here that a ‘survey response’ was generated each time somebody opened the survey, even if they answered no questions. Does that mean there wasn’t a ‘complete and submit’ step? Was every partially completed survey considered a separate ‘person’? If so, was there any way to determine if individuals were opening multiple times?
If each (however complete) opening of the survey was considered an entry (ie whatever data was entered has been counted as a person), that would suggest that individuals making several attempts to complete the survey are being multiply counted. That would be supported if the non response rates are generally higher later in the survey, which I can’t tell from this report.
If multiple attempts can’t be excluded, these numbers are unlikely to be valid. Missing data is a difficult problem, but my first thought is that a safer approach would be to only include complete responses
When ACE and HRC talked to statisticians and survey researchers as part of developing our Survey Guidelines for animal charities beginning to evaluate their own work, they consistently said that demographic questions should go at the end of the survey because they have high non-response rates and some people don’t proceed past questions they aren’t answering. So while it’s intuitively surprising that people don’t answer these simple questions, it’s not obviously different from what (at least some) experts would expect. I don’t know, however, whether 20% is an especially high non-response rate even taking that into account.
That’s interesting to know, thank you for sharing it! Looking at this study (comparing mail and web surveys) they cited non response rates to demographic items at 2-5%. However I don’t know how similar the target population here is to the ‘general population’ in these behaviours. http://surveypractice.org/index.php/SurveyPractice/article/view/47/html
Yes, these questions were right at the end. You can see the order of the questions in the spreadsheet that Peter linked to—they correspond to the order of the columns.
Thanks Tom. I’m limited in my spreadsheet wrangling at the moment I’m afraid, but looking at non response rates that are cited in the document above, and comparing to the order or questions, non responses seem to be low (30-50) until the question on income and donation specifics, after which they are much higher (150-220). A question that requires financial specifics seems likely to require someone to stop and seek documents, so could well cause someone to abandon the survey at least temporarily. If somebody abandoned the survey at that point, would the information they had entered so far be submitted? Or would they have to get to the end and cluck submit for any of their data to be included?
That’s a good point, could well have happened, and is something we should consider changing.
The questions were split into a few pages, and people’s answers got saved when they clicked the ‘Continue’ button at the bottom of each page—so if they only submitted 2 pages, only those pages would be saved. We searched for retakes and saw a small number which we deleted.
Oh cool. How were you able to identify duplicates?
I looked for identical names or email addresses, and then manually checked them. The other thing we could do would be to record people’s IP addresses and look for identical ones. However I chose not to record them due to privacy concerns. I would promise not to use them to try guessing who people are, and this identifying data never gets shared with anyone but me—I’d appreciate feedback on whether people would be comfortable with IP addresses getting recorded given this.
Tom was the one who created the survey taking architecture, so I’ve asked him to get back to you, just to make sure I don’t give you incorrect information.