Publication type
Survey Futures Working Paper Series
Series Number
13
Series
Survey Futures Working Paper Series
Authors
Publication date
April 1, 2026
Summary:
We present the results of a new approach to measuring the occupations of respondents in surveys using Large Language Models (LLMs). Occupation is a notoriously difficult variable to measure accurately due to the very large number of occupations and the technical ways they are described in standard classifications. These features of occupational classification systems mean that the measurement and classification stages are usually not conducted simultaneously, with coding of open responses about job title and tasks implemented in a subsequent stage of ’office coding’. In our new approach, which we call SOCbot, an LLM integrated in the questionnaire scripting software is used to code the job title response to the occupational classification in real-time during the interview. Where the job title does not contain sufficient information to be coded with confidence, the LLM probes for further relevant details on job tasks, industry, qualifications, and so on. SOCbot can also be used in static mode offline on already collected response data. Our results demonstrate that the approach attains rates of coder reliability comparable to trained human coders. We also demonstrate that the approach is feasible in large scale survey operations and has significant potential to reduce respondent burden, lower costs, and yield more timely and accurate data.
Subjects
Link
https://surveyfutures.net/working-papers/
#589048