Go to content

7. Annex 1: Data preparation

7.1. Overview of data used

The analyses presented in this paper were produced using data from the National Statistical Institutes (NSIs) of all the Nordic countries and regions. The tables were accessed programmatically using the APIs provided by the NSIs. Table 3 presents an overview of the key datasets used.
Table 3: Income data used
Country
Dataset
Indicator
Unit
Time
DK
IFOR32
Avg. equivalised disposable income in decile groups, by decile average, municipality and time
Decile average (DKK)
1987-2022
FI & AX
11wh
Income shares (%), means, medians and maximum values of decile and percentile groups by income decile or fractile group, information and year
Disposable cash income per consumption unit, mean (EUR)
1995-2022
FI & AX
12hh
Dwelling population by income decile by year, income decile or fractile group, region and Information
Household-dwelling unit population in income decile, persons
1995-2022
FI & AX
12hh
Dwelling population by income decile by year, income decile or fractile group, region and information
Share of population belonging to the income decile, %
1995-2022
FO
IP01027
Equivalent income by percentile intervals, age, sex and region (2009-2021)
Decile average (DKK)
2009-2021
GL
INEH
Decile distribution of household income by time, municipality, place of residence, number of adults, number of children, decile and type of income
Average equivalised disposable household income (DKK)
2002-2018
IS
CEN1603
Census: total equivalised income of private households by income deciles and statistical output area 2011 and 2021
Mean equalised total income (ISK)
2011,2021
NO
09557
Persons equivalent income (percentile). Income after tax per consumption unit in NOK nominal, by region, percentile, contents and year
Income after tax per consumption unit (NOK)
2004:2022
NO
12558
Households, by region, income before / after tax, decile group, contents and year
Highest value in decile (NOK)
2005:2022
NO
12558
Households, by region, income before / after tax, decile group, contents and year
Percentage households
2005:2022
SE
TabVX2DispInkN
Income distribution (fractiles) by region, type of income, distribution measurements, observations and year
Mean equivalised disposable income excluding capital gains (SEK)
2011-2022
More often than not, the statistical definitions, units of measure and presentations of the databases providing information about household income and inequality differed. That made it necessary to pre-process the datasets to enable pan-Nordic comparisons. In general, the process can be summarised as follows:
  1. Retrieve the relevant statistical tables from the NSIs of the Nordic countries and self-governing territories.
  2. Pre-process the data (merge tables, filter and combine variables etc.) to extract or estimate the information on average equivalised net household income by income decile, territorial level and year.
  3. Calculate inequality metrics, including Gini, Atkinson, Theil, Hoover, Herfindahl, Coulter and Dalton indices on original income data in national currency.
  4. Transform income data into Purchasing Power Parity (PPP). That enables more meaningful comparisons of income levels between areas. The following EU databases were used to perform these transformations:
The following section outlines the processing steps performed to prepare the data for analysis in each country.

7.2. Country data

7.2.1. Denmark

In the case of Denmark, the following table was used:
  • IFOR32: Avg. equivalised disposable income in decile groups, by decile average, municipality and time
As its name suggests, IFOR32 provides information on average equivalised disposable income by decile average, municipality and time. The information in IFOR32 was used to calculate several inequality metrics, including Gini, Atkinson, Theil, Hoover, Herfindahl, Coulter and Dalton indices. For the sake of comparability between different countries, the average household income values in IFOR32 were also converted into euros using Purchasing Power Parities (PPPs) standards. These values complement the inequality indicators available in table INFOR41. As described below, that information was also used for quality assessment.

7.2.2. Faroe Islands

For the Faroe Islands, we downloaded the following table:
  • IP01027 Equivalent income by percentile intervals, age, sex and region (2009-2021)
IP01027 was used to retrieve information on average income per household. Since the information on equivalised household income is provided by percentiles rather than deciles, we grouped the percentiles into ten categories and then calculated the arithmetic mean for each decile. Subsequently, we computed the Gini, Atkinson, Theil, Hoover, Herfindahl, Coulter and Dalton indices on the basis of the decile averages. As a final step, income values were converted into euros using Purchasing Power Parities (PPPs) standards, thereby simplifying comparability across areas.

7.2.3. Finland

In the case of Finland, the following tables were used:
  • 11wh: Income shares (%), means, medians and maximum values of decile and percentile groups by income decile or fractile group, information and year
  • 12hh: Dwelling population by income decile by year, income decile or fractile group, region and information
Statistics Finland does not provide information on equivalised income by income group and municipality. However, the inequality measures can be approximated by combining information on income levels (i.e. average disposable cash income per “consumption unit”) by decile groups at the national level (Table 11wh) with information on the number of persons in each area (region or municipality) in each income category (Table 12hh). That allows for the approximation of a series of inequality measures, including the Gini, Theil, Hoover and Coulter indices, by weighting income levels according to population shares in each income category.
We also estimated the mean equivalised household income (i.e. average disposable cash income per “consumption unit”) in each decile by weighting by population shares in each income category and region and referencing those values to the national average. The calculation is based on the income and population volume ratios for each area and year. The formula is given by:
$$ I_{id} = I_{cd} \cdot \left[ \frac{\frac{V_{id}}{\sum_{i=1}^{N} V_{id}}}{\frac{V_{cd}}{\sum_{i=1}^{N} V_{cd}}} \right] \cdot \left[ \frac{\frac{P_{id}}{\sum_{i=1}^{N} P_{id}}}{\frac{P_{cd}}{\sum_{i=1}^{N} P_{cd}}} \right]^{-1} $$
(1)

where
I_{id}
is the estimated mean equivalised household income for any given income decile
d
 and territorial unit
i,I_{cd}
 is the known average income for any given decile
d
, as defined at the national level,
V_{id}
 is the total volume of equivalised household income in any given decile
d
 and territorial unit
i,V_{cd}
,  is the total volume of equivalised household incomein any given decile
d
, as defined in the national distribution,
P_{id}
 is the population in any given income decile
d
 and territorial unit
i
, and
P_{cd}
 is the population in any given income decile
d
 at the national level.
Those income estimates were used to calculate the inequality metrics that cannot be derived from weighted values, namely the Atkinson, Herfindahl and Dalton indices. As usual, we also transformed the average household income estimates into Purchasing Power Parities (PPPs) standards to facilitate comparability across areas.

7.2.4. Greenland

In the case of Greenland, the following tables were used:
  • INXFI401: Decile distribution of income for families by municipality and type of income
  • INXFI101: Family income by municipality and type of income
Greenland provides information on income levels by decile groups, municipalities and time (INXFI401). However, since the information in this table is not equivalised, we had to calculate it using the information on family structures (number of adults and children) from the same table. We applied the modified OECD equivalence scale (Eurostat, 2024), which attributes a weight to all members of the household:
  • 1.0 for the first adult;
  • 0.5 for the second and each subsequent person aged 14 and over;
  • 0.3 for each child aged under 14.
We then computed the inequality indicators, namely the Gini, Atkinson, Theil, Hoover, Herfindahl, Coulter and Dalton indices, using the equivalised household income by decile. Those values were validated by comparing them to the equivalised disposable household income by municipality in INXFI101. In general, the deviation between both values was small, with the greatest differences found in some areas of Kommune Kujalleq, where the maximum relative deviation reached -15 percent for one observation. In the remaining areas, the differences were smaller (averaging -3.18 percent).

7.2.5. Iceland

In the case of Iceland, the following tables were collected:
  • TEK01002: Income by municipalities and sex 1990-2022 – division into municipalities as of 1 January 2023
  • TEK01003: Income by municipalities and sex 1990-2022 – current municipalities
  • CEN1603: Census: total equivalised income of private households by income deciles and statistical output area 2011 and 2021
Iceland provides aggregated income data at the municipal level (TEK01002 and TEK01003). However, the income data in those tables are not classified by income deciles, nor are the values equivalised by household size. Alternatively, the table CEN1603 provides data on equivalised income of private households by income deciles and region. However, those data come from population censuses and are only available for Statistical Output Areas in 2011 and 2021. Further, they are provided as total household income instead of net household income.
As a first processing step, the equivalised household income figures in CEN1603 were transformed from total to net values by applying decile-specific scaling factors loosely reflecting taxation data in Iceland. The ratios applied ranged from 0.63 for decile 1 to 0.53 for decile 10. The inequality metrics were calculated on the basis of those values. Additionally, the mean and median equivalised household income values were further re-scaled so that the estimated aggregated values at the national level matched those available in the Eurostat table Median and median income by age and sex – EU-SILC and ECHP surveys (ilc_di03). The re-scaling factors used were 0.682 for the mean and 0.732 for the median). Moreover, to enable currency transformation to euros, it was also necessary to interpolate the EUR/ISK exchange rates for the period 2009-2017. Finally, all income values were converted into Purchasing Power Parities (PPPs) standards.

7.2.6. Norway

The analysis for Norway draws on the following tables:
  • 12558: Households, by region, income before / after tax, decile group, contents and year
  • 09557: Persons equivalent income (percentile). Income after tax per consumption unit in NOK nominal, by region, percentile, contents and year
  • 06081: Persons in private households, by region, type of household, contents and year
Table 12558 provides information on the number of households by municipality and region, income before and after tax, decile group, contents and year. However, the table does not include average income values within each percentile and the income information is categorised by the highest value in each decile. Moreover, the values are not equivalised according to the characteristics of the households. Table 09557 provides information about average income after tax per “consumption unit” in NOK. The information is available by county and, as in the previous case, the income values for the upper percentiles are not provided. Finally, Table 06081 provides information on household composition by area.
The data processing started by combining the three tables and estimating the equivalised household income values in Table 12558 by weighting the after-tax income values by the equivalised household sizes, according to Table 06081. To calculate the equivalent household sizes, we applied the modified OECD equivalence scale (Eurostat, 2024) as follows:
  • 1.0 for the first adult;
  • 0.5 for the second and each subsequent person aged 14 and over;
  • 0.3 for each child aged under 14.
Given that neither Table 12558 nor Table 09557 provide income values for the upper income deciles, those were estimated using a cubic extrapolation spline (Hyman method).
In the case of 09557, however, the mean equivalised household income estimates for the upper decile were further increased by 30%. That was done to further align the Gini coefficients calculated based on these values with those provided by the NSI.
Subsequently, we combined both tables and calculated the average household income for each area as a weighted mean of the equivalised income values reported in 09557, considering the population shares in 12558 (see Equation 1). We also estimated a series of inequality measures, including the Gini, Theil, Hoover and Coulter indices, using that same weighting approach.
Further, we defined median equivalised income in each statistical unit as the upper threshold in the fifth income decile and calculated unweighted Atkinson, Herfindahl and Dalton inequality indices. Finally, all income values were converted into euros and euros in Purchasing Power Parities (PPPs) standards.

7.2.7. Sweden

For Sweden we used the following table:
  • TabVX2DispInkN: Income distribution (fractiles) by region, type of income, distribution measurements, observations and year
TabVX2DispInkN provides information on various types of income. We focused on mean equivalised disposable income excluding capital gains, measured in SEK thousands. Those values are provided as upper bound values, alongside the mean values for each income decile. The mean income values were used to compute the Gini, Atkinson, Theil, Hoover, Herfindahl, Coulter and Dalton inequality metrics on the decile averages. The median equivalised income in each unit was obtained from the upper threshold in the fifth income decile, whereas the income values for the upper threshold in the 10th decile were estimated using a cubic extrapolation spline (Hyman method). As a final step, income values were converted into euros using Purchasing Power Parities (PPPs) standards.

7.3. Eurostat and ECB

In addition to the country-level statistics, the following complementary tables were downloaded from official data repositories in the EU:
  • The European Central Bank’s (ECB) reference exchange rates for Nordic currencies
  • Purchasing Power Parities (PPPs), price level indices and real expenditures for ESA 2010 aggregates (prc_ppp_ind)
  • Mean and median income by household type – EU-SILC and ECHP surveys (ilc_di04)
  • Mean and median income by age and sex – EU-SILC and ECHP surveys (ilc_di03)
  • Gini coefficient of equivalised disposable income by age (ilc_di12)

7.4. Data validation

The inequality metrics from NSIs (shown in in Table 4) were used to validate the calculations described in the previous section.
Table 4: Inequality indicators provided by the NSIs
Country
Dataset
Indicators
Unit
Time
DK
IFOR41
Inequality indicators on equivalised disposable income
Gini coefficient, Hoover index, P90/10 (Upper decile boundary divided by lower), S80/S20 (Income quintile share ratio)
1987-2022
FI
127l
Income differences and equalising impact of current transfers on income differences in the dwelling population by year, region and information
Gini coefficient
1995-2022
FO
IP01010
Gini and Hoover indexes and income quantile ratios by age, sex, type of household and region (2009-2021)
Gini index, Hoover index, income quantile ratio 90/10, income quantile ratio 99/1, income quintile ratio 80/20, population
2009-2021
GL
INXIU101
Inequality indicators on equivalised disposable income by indicator
At-risk-of-poverty rate. 40 per cent threshold, at-risk-of-poverty rate. 50 per cent threshold, at-risk-of-poverty rate. 60 per cent threshold, Gini coefficient, S80/20
2002-2022
IS
LIF01110
Gini coefficient and quintile share ratio 2004-2022
Gini coefficient
2004-2022
NO
09114
Measures of income dispersion. household equivalent income (EU-scale) between persons, by region, contents and year
Gini coefficient, P90/P10
2004-2022
SE
TabVX1DispInkN
Income inequality indicators by region, type of income, observations and year
At-risk-of-poverty rate, percent, Gini coefficient, high economic standard, percent, mean value, SEK thousands, median at-risk-of-poverty gap, percent, median value, SEK thousands, number of persons, P05/P50, P10/P50, P20/P50, P80/P20, P80/P50, P90/P10, P90/P50, P95/P05, P95/P50
2011-2022
The validation was done by looking at the deviations between the official Gini coefficients and those calculated on the basis of the mean equivalised household income by decile. Figure 17 provides an overview of the absolute deviations of our Gini coefficients in relation to the official indicators provided by the NSIs of the Nordic countries.
""
Figure 17: Comparison of inequality metrics by source
Note: The NSIs of Iceland and Greenland do not provide regionalised ineguality metrics.
As seen in Figure 18, the larger deviations in Gini scores in Greenland do not seem to be affected by the data imputation step performed. Rather, the deviation seems to be due to the use of different statistical presentations of income measures in this region.
""
Figure 18: Comparison of inequality metrics by source
The analysis based on income values is only possible for the units where both official statistics and our own calculations are simultaneously available. However, it can be used as an indication of the overall precision of the estimates for areas where official information is not available. Moreover, since the inequality indices are calculated based on income data, the assessment also provides information about the consistency of the decile income estimates. Figure 19 shows a comparison of the aggregated mean equivalised household income values estimated in this work and those provided by the NSIs and Eurostat (ilc_di04).
""
Figure 19: Comparison of household income data by source

7.5. Data limitations

The data preparation steps described in this annex highlight several caveats and limitations of the municipal income and inequality statistics available from the Nordic NSIs:
  • Non-harmonised statistical definitions: Income data are not entirely comparable across countries and territorial levels. In some cases, the data aggregate various sources of income, including capital gains and similar streams, while other data only include cash income. Most frequently, income data are provided as net income (after tax), but in some cases they are given as total income (before tax). Some data are provided as equivalised monetary values (i.e. household income divided by the equivalised number of household members), whereas other NSIs provide non-equivalised household income. Additionally, some income datasets are segmented by deciles or similar groupings, while others provide aggregated income inequality data alongside mean or median values for the entire statistical unit.
  • Persistent data gaps: Virtually all the income datasets available from the NSIs of the Nordic countries have data gaps. That may be due to the discontinuation of various indices and/or modifications of internal territorial definitions and geographical delimitations within each country. Since some of the statistical methods applied in this work require long data series, such gaps had to be filled using data interpolation and/or imputation methods.
  • Lack of robust measures of income inequality: Virtually all the Nordic NSIs provide Gini coefficients at various territorial levels. However, only a few offices provide alternative indices such as Theil and Atkinson indices. Often, those complementary metrics are not the same or are not calculated on the basis of the same data across regions.
All those factors hinder an accurate analysis of the structural characteristics and development of income inequalities in the Nordic Region.