Model-based clustering of time-dependent categorical sequences with application to the analysis of major life event patterns

Publication type

Journal Article


Publication date

June 15, 2021


Clustering categorical sequences is a problem that arises in many fields. There is a few techniques available in this framework but none of them take into account the possible temporal character of transitions from one state to another. A mixture of Markov models is proposed, where transition probabilities are represented as functions of time. The corresponding expectation–maximization algorithm is discussed along with related computational challenges. The effectiveness of the proposed procedure is illustrated on the set of simulation studies, in which it outperforms four alternative approaches. The method is applied to major life event sequences from the British Household Panel Survey. As reflected by Bayesian Information Criterion, the proposed model demonstrates substantially better performance than its competitors. The analysis of obtained results and related transition probability plots reveals two groups of individuals: people with a conventional development of life course and those encountering some challenges.

Published in

Statistical Analysis and Data Mining

Volume and page numbers

Volume: 14 , p.230 -240









Latest findings, new research

Publications search

Search all research by subject and author


Researchers discuss their findings and what they mean for society


Background and context, methods and data, aims and outputs


Conferences, seminars and workshops

Survey methodology

Specialist research, practice and study

Taking the long view

ISER's annual report


Key research themes and areas of interest