Publication type
Survey Futures Working Paper Series
Series Number
15
Series
Survey Futures Working Paper Series
Authors
Publication date
April 15, 2026
Summary:
This paper investigates whether large language models (LLMs) can approximate canonical human question review procedures. We compare three prompt variants for simulated cognitive testing against a simpler expert review approach using a set of 20 questions containing deliberately embedded problems, with 7 items from the European Social Survey used as a control set. The best-performing configuration, a guided cognitive testing variant, detects 75% of known problems with a small false positive rate. As with their human counterparts, simulated cognitive testing and expert review detect different types of problem, suggesting the two approaches are complementary and could fruit fully be used in tandem. Detection and false positive rates are sensitive to prompt design and model choice: structured prompts that work well on open weight models perform poorly on the proprietary model tested, and open-weight models achieve higher detection rates but much poorer discrimination between flawed and well-designed questions. A full evaluation of a 20-item questionnaire completes in under an hour at a cost of a few dollars, orders of magnitude faster and cheaper than human pretesting. These findings suggest that LLM-based question testing has the potential to provide a useful complement to human pretesting, particularly for early-stage screening of large item sets and in situations where resource constraints mean human-testing is not possible.
Subjects
Link
https://surveyfutures.net/working-papers/
#589050