SOCbot: using large language models to measure and classify occupations in surveys

Publication type

Survey Futures Working Paper Series

Series Number

13

Series

Survey Futures Working Paper Series

Authors

Publication date

April 1, 2026

Summary:

We present the results of a new approach to measuring the occupations of respondents in surveys using Large Language Models (LLMs). Occupation is a notoriously difficult variable to measure accurately due to the very large number of occupations and the technical ways they are described in standard classifications. These features of occupational classification systems mean that the measurement and classification stages are usually not conducted simultaneously, with coding of open responses about job title and tasks implemented in a subsequent stage of ’office coding’. In our new approach, which we call SOCbot, an LLM integrated in the questionnaire scripting software is used to code the job title response to the occupational classification in real-time during the interview. Where the job title does not contain sufficient information to be coded with confidence, the LLM probes for further relevant details on job tasks, industry, qualifications, and so on. SOCbot can also be used in static mode offline on already collected response data. Our results demonstrate that the approach attains rates of coder reliability comparable to trained human coders. We also demonstrate that the approach is feasible in large scale survey operations and has significant potential to reduce respondent burden, lower costs, and yield more timely and accurate data.

Subjects

Link

https://surveyfutures.net/working-papers/

#589048

News

Latest findings, new research

Publications search

Search all research by subject and author

Podcasts

Researchers discuss their findings and what they mean for society

Projects

Background and context, methods and data, aims and outputs

Events

Conferences, seminars and workshops

Survey methodology

Specialist research, practice and study

Themes

Key research themes and areas of interest