LLMs for survey pretesting: how well can large language models identify flaws in survey questions?

Publication type

Survey Futures Working Paper Series

Series Number

Series

Survey Futures Working Paper Series

Authors

Publication date

April 15, 2026

Summary:

This paper investigates whether large language models (LLMs) can approximate canonical human question review procedures. We compare three prompt variants for simulated cognitive testing against a simpler expert review approach using a set of 20 questions containing deliberately embedded problems, with 7 items from the European Social Survey used as a control set. The best-performing configuration, a guided cognitive testing variant, detects 75% of known problems with a small false positive rate. As with their human counterparts, simulated cognitive testing and expert review detect different types of problem, suggesting the two approaches are complementary and could fruit fully be used in tandem. Detection and false positive rates are sensitive to prompt design and model choice: structured prompts that work well on open weight models perform poorly on the proprietary model tested, and open-weight models achieve higher detection rates but much poorer discrimination between flawed and well-designed questions. A full evaluation of a 20-item questionnaire completes in under an hour at a cost of a few dollars, orders of magnitude faster and cheaper than human pretesting. These findings suggest that LLM-based question testing has the potential to provide a useful complement to human pretesting, particularly for early-stage screening of large item sets and in situations where resource constraints mean human-testing is not possible.

Subjects

Link

https://surveyfutures.net/working-papers/

#589050

Publication type

Series Number

Series

Authors

Publication date

Summary:

Subjects

Link

News

Working papers

Publications search

Podcasts

Projects

Events

Survey methodology

Themes