<?xml version="1.0" encoding="UTF-8"?>
<paper xmlns="http://www.w3.org/2005/Atom">
  <title>Using the P90/P10 Index to Measure US Inequality Trends with Current Population Survey Data: A View from Inside the Census Bureau Vaults</title>
  <url>http://www.iser.essex.ac.uk/publications/working-papers/iser/2007-14</url>
  <summary>The vast majority of research on trends in labour earnings and income inequality since the 1970s in the USA has been based on public use files of the March Current Population Survey (CPS). In the cross-national comparative literature, CPS data are also commonly used to compare both labour earnings and income inequality levels and trends in the USA with other industrialized countries. The most important source of standardized cross-sectional micro data on industrialized countries - the Luxembourg Income Study (LIS) - uses the public use version of the CPS data for the USA. The public use CPS data are also a major source of information about US inequality in the World Income Inequality Database.

To protect the confidentiality of CPS respondents, top codes are imposed on every source of income above a specific value, with the top code value differing by income source. For example, if someone reports earning a million dollars, then the wage and salary data for that respondent that researchers see in the data set is not one million dollars, but a lower value (the top coded value). Household income data is more likely to be top-coded than wages and salary income because it is the aggregation across individuals of a large number of income sources, each of which may be top-coded and, if one source is top coded, then so is the aggregated income variable. 

Top coded data cause problems for inequality analysis because they censor the range of incomes that are observed. Inequality is underestimated because very high incomes appear as less-high incomes. This problem would be less of an issue when one is looking at inequality trends over time if the nature and extent of top coding were constant. However, CPS top codes have changed over time in a number of ways, leading to a potentially serious time-inconsistency problem for inequality analysis.

This time-inconsistency has led many researchers to use a measure of inequality that they believe will insulate them from the problem. The measure is the ratio of the 90th and 10th percentile of a distribution (P90/P10). If you lined every one up in ascending order of income, the 10th percentile would be the income of the person one tenth of the way along the parade, and the 90th percentile would be the income of the person nine-tenths along. The greater the difference between these two incomes - the larger that P90/P10 is - the greater the degree of inequality. P90/P10 contrasts with other commonly-used summary measures of inequality such as the Gini coefficient, Theil index, or coefficient of variation, each of which uses information about all income values, rather than only two. In the US labour economics literature, P90/P10 is the most commonly used measure of wage or labor earnings dispersion.. In the US income inequality literature, the P90/P10 is also a standard measure of inequality in the distributions of size-adjusted family or household income. 

Researchers have implicitly assumed that P90/P10 is not affected by censoring, reasoning that the fraction of observations affected by censoring of total wages and salaries, labour earnings or income is less than 10 percent. While this is true, in the CPS data, censoring takes place at the level of each income source not for income totals, so some values below the 90th percentile of total labor earnings and especially the 90th income percentile are censored. As a result, even what are apparently modest amounts of censoring in the population as a whole may affect estimates of P90/P10.

To address the issues raised by censoring requires use of internal March CPS data, and we have been able to gain access to them for the very first time for this purpose. Our analysis considers data for income years 1975-2004. We examine three distributions of income that are commonly assessed in the labour and income inequality literatures: (i) wages and salaries income among individuals working full-time full-year for wages; (ii) total earnings income among full-time, full-year workers (wage and salaries plus farm and non-farm self-employment earnings); and (iii) household income among all individuals. 

Our paper makes three contributions. First, using innovative bounding methods, we show that calculating P90/P10 with public use CPS data - even when Census Bureau cell means are used for top coded values - does not completely obviate the problem of time-inconsistency, especially for those interested in trends in the inequality of individuals' size-adjusted household income. Second, we offer a means by which researchers may reduce problems caused by censoring. Because we have access to the internal CPS data, we have been able to create consistent cell mean values for all top-coded values in all years of internal data made available to us (1975-2004) that offer a plausible correction for time inconsistency problems in the public use CPS data when integrated with them. 

Our third contribution concerns the assessment of longer-term US inequality trends. When we compare estimates of P90/P10 based on our adjusted public use CPS data with estimates of Gini coefficients based on either the internal or public use CPS data consistently top-coded to control for time inconsistencies, we find that the trends in P90/P10 differ significantly from the trends in either of the two Gini coefficient series. Hence, researchers should be cautious in making inference about trends in the inequality of the distributions of wages and salaries income, labour earnings income, or size-adjusted household income over the last three decades based on changes in the relative position of only two points in each of those distributions.</summary>
  <abstract>The March Current Population Survey (CPS) is the primary data source for estimation of levels and trends in labor earnings and income inequality in the USA. Time-inconsistency problems related to top coding in theses data have led many researchers to use the ratio of the 90th and 10th percentiles of these distributions (P90/P10) rather than a more traditional summary measure of inequality. With access to public use and restricted-access internal CPS data, and bounding methods, we show that using P90/P10 does not completely obviate time-inconsistency problems, especially for household income inequality trends. Using internal data, we create consistent cell mean values for all top-coded public use values that, when used with public use data, closely track inequality trends in labor earnings and household income using internal data. But estimates of longer-term inequality trends with these corrected data based on P90/P10 differ from those based on the Gini coefficient. The choice of inequality measure matters.</abstract>
  <paper_series>Working Paper</paper_series>
  <series_number>2007-14</series_number>
  <published_date>2007-06-02</published_date>
  <author>
    <firstname>Shuaizhang</firstname>
    <familyname>Feng</familyname>
    <instutitue>Shanghai University of Finance and Economics</instutitue>
  </author>
  <author>
    <firstname>Stephen</firstname>
    <familyname>Jenkins</familyname>
    <instutitue>Institute for Social and Economic Research</instutitue>
    <email>stephenj@essex.ac.uk</email>
  </author>
  <author>
    <firstname>Richard</firstname>
    <familyname>Burkhauser</familyname>
    <instutitue>Cornell University</instutitue>
    <email>rvb1@cornell.edu</email>
  </author>
</paper>
