Differences in the Income Distribution of Households Run by Men and Women by Voivodeships

Abstract Research background: Household income depends on its demographic composition, age and education of its members, place of residence and many other factors. In our work, we concentrate on the income distribution of Polish households. Purpose: The study aims to compare the household income distributions in Polish voivodeships, taking into account the gender of the family head. We provide evidence on the magnitude and determinants of regional differences in gender-specific income disparities. Research methodology: In order to move beyond estimation based on mean values, we apply the Residual Imputation Approach and extend the Oaxaca-Blinder decomposition procedure to different quantile points along the income distribution. To describe the differences between two income distributions we construct a counterfactual distribution and decompose the inequalities into explained and unexplained components. Results: The regional variation of the gender income gap has been explained with individual and jobrelated characteristics. There exists an important diversity in the size of the gender income gap across the Polish provinces. The results obtained for 16 voivodeships allowed us to group them into four clusters: heavily industrialized voivodeships with a large income gap, weakly industrialized with a low income gap, voivodships with large agglomerations characterized by a low gap, and medium-developed voivodeships with a large, U-shaped gap. Novelty: Our results provide novel insights into the regional dimension of the income gap.


Introduction
Income inequality in Poland increased significantly during the its transformation to a market economy. According to the Household Budget Survey (HBS) data, we can observe for Poland the Gini index at about the level of 0.298 (Statistics Poland, 2018). In 2017, available income per capita was higher than the national average in six voivodeships: Mazowieckie, Zachodniopomorskie, Pomorskie, Śląskie, Dolnośląskie, and Wielkopolskie. The highest average income per capita was available for households from the Mazowieckie Voivodeship and it was 19.6% higher than the national average income per capita. The lowest income was registered in the Podkarpackie Voivodeship (21.5% below the average).
Often in the analysis of income inequality, the main focus is the gender pay gap.
The findings of numerous empirical studies show that men earn higher wages than women.
A similar observation may apply to the income of households run by women and men.
One aspect of the gender pay gap that has received very little attention so far concerns its regional dimension. Empirical studies mostly focus on the national level (for example, Jurajda, 2003 for the Czech Republic and Slovakia; Pena-Boquete, De Stefanis, Fernandez-Grela, 2010 for Italy and Spain; Chatterji, Mumford, Smith, 2011 for the United Kingdom; Śliwicki, Ryczkowski, 2014 for Poland) or adopt a cross-country perspective (Olivetti, Petrongolo, 2008;Hedija, 2017;Boll, Lagemann, 2019). However, the gender pay gap varies considerably within a country. In Polish provinces (NUTS2-regions), the unadjusted gender pay gap calculated for the total annual salary ranges between 1.3 and 24.8% and the gap calculated for the hourly wage ranges between -16.4 and 14.9%, which means that in some regions women even earn more than men (Śliwicki, 2017). Surprisingly, little is known about the mechanisms that drive these regional disparities. Regions differ substantially in their sectoral composition, providing different employment possibilities for men and women. Regional disparities are enhanced by the differences in workers' characteristics across regions. Research on these issues is of great importance. Comprehensive evidence on regional gender income disparities in Poland is missing.
Currently, the rapid development of microeconometric techniques useful in the context of studying differences between groups of objects can be observed. Since the seminal works of R. Oaxaca (1973) and A. Blinder (1973) many procedures that go beyond the simple decomposition of differences between the average values have been proposed (see e.g. Juhn, Murphy, Pierce, 1993;DiNardo, Fortin, Lemieux, 1996;Gosling, Machin, Meghir, 2000;Donald, Green, Paarsch, 2000;Machado, Mata, 2005;Autor, Katz, Kearney, 2005). The main advantage of modern decomposition methods is to help to discover the factors affecting for example changes in the distribution of wages. The differences in income distributions of men and women are also analyzed. The research conducted in Poland regarding the gender pay gap is limited to decomposing the average level of wage differences (e.g. Słoczyński, 2012;Goraus, Tyrowicz, 2014;Śliwicki, Ryczkowski, 2014) and only a few studies go beyond the mean decomposition.
In the paper, we analyze regional income inequalities in terms of the household income distribution. The main objective of our study is to compare the distribution of household income in the Polish voivodeships (in 16 NUTS2-regions), taking into account the gender differences in family heads. In order to move beyond estimations based on mean values, we apply the Residual Imputation Approach (Juhn, Murphy, Pierce, 1993). We argue that employing this technique can provide deeper insights into the nature of income differentials. This paper explains the regional variation of the gender income gap in Poland with individual and establishment characteristics. We employ data from the Household Budget Survey (HBS) for Poland in 2015.
The paper is organized as follows. Section 2 describes the techniques used for the decomposition of inequalities. In section 3, we present data and the results of the decomposition analyses for Poland and the single regions. Section 4 discusses the results and offers some concluding remarks.

Method of the analysis
In the analysis of income inequality, it may be relevant to assign inequality contributions to various population subgroups associated with various socioeconomic characteristics of individuals. Let Y g denote the outcome variable in group g (e.g. the personal income in the men's group, g = M, or in the women's group, g = W) and X g the vector of individual socioeconomic characteristics of people in group g (e.g. age, education level, work experience).
The expected value of Y conditionally on X is a linear function , , β are the returns to the characteristics. The Oaxaca-Blinder decomposition can be applied whenever we need to explain the differences between the expected values of dependent variable Y in two comparison groups (Oaxaca, 1973;Blinder, 1973): The explained effect expresses the difference in the potentials of people in two groups and presents the effect of characteristics. The unexplained effect is the result of differences in the estimated parameters, and so in the "prices" of individual characteristics of the group's representatives. It can be interpreted as labor market discrimination. Also, the detailed decomposition may be calculated. The disadvantage of this approach is that it focuses solely on average effects which may result in a confusing assessment if the effects of covariates vary across income distribution.
The objective of this study is to extend the Oaxaca-Blinder decomposition procedure to different quantile points along income distribution. Let ( ) g Y F y be the distribution function for the variable Y in group g, which can be expressed using the conditional distribution To calculate the differences between the two distributions we have to construct a counterfactual distribution that mixes conditional distribution for the outcome variable Y with various distributions for explanatory variables X: A counterfactual distribution is the distribution of incomes that would prevail for people in group W if they had the distribution of characteristics of group M. Then the observed difference between the two distributions may be decomposed into: The main difference to the Oaxaca-Blinder decomposition is that this decomposition refers to full distributions, rather than just to their means.
Several approaches have been suggested in the literature for estimating counterfactual distribution (e.g., DiNardo, Fortin, Lemieux, 1996;Fortin, Lemieux, 1998;Donald, Green, Paarsch, 2000;Machado, Mata, 2005;Fortin, Lemieux, Firpo, 2010). An approach proposed by Ch. Juhn, K.M. Murphy and B. Pierce (1993) is based on the Residual Imputation Procedure. The implementation of the procedure is two-step. In the first step, the residuals are replaced by counterfactual residuals under the assumption of rank preservation: ,1 ,1 , 1, , , the conditional rank of Mi v in the distribution of residuals for men. In the second step, the counterfactual returns to observables are imputed as ,2 ,1 , 1, , In this study, after the assessment of the income gap in voivodeships, an attempt was made to group them using the hierarchical clustering method. Hierarchical clustering is a widely used data mining tool for grouping data into clusters that exposes similarities or dissimilarities in the data. We have applied the agglomerative method of hierarchical clustering, in which at each step of the clustering process an observation or cluster is merged into another cluster. The calculation of the new distance was carried out using the average-linkage cluster analysis method based on the Euclidean metric.

Data used and results of the empirical study
Our analysis rests on data from the Household Budget Survey ( age, education level, etc. The average monthly disposable equivalent income of a household run by a man was PLN 2,420.11, while for a household run by a woman it was PLN 2,329.45. The data indicate that income inequalities between households run by men and women in voivodeships can be observed (see Table 1).
The lowest average monthly income was observed in the Podkarpackie Voivodeship, and the richest region was the Mazowieckie Voivodeship, with its highest value of average income The maximum inequality was recorded in the Lubuskie Voivodeship. In two provinces the income gap was negative (in the Podlaskie and Podkarpackie voivodeships). In our further empirical decomposition analysis, the logarithm of the average monthly disposable equivalent income constitutes the outcome variable. We follow the seminal work of R. Oaxaca (1973) and A. Blinder (1973) and decompose the unadjusted log income gap into an explained and unexplained part for Poland as well as for all 16 NUTS2-regions. The explanatory factors include individual and company determinants that relate to the head of the household (see Table 2). We establish seven explanatory variables in our models. As individual characteristics, age, education level and place of the residence of the household head were included. As jobrelated characteristics, working time, contract type, ownership of the company were taken into account.
It is useful to look at summary statistics for some covariates (in Table 3).   We have found that there is a positive difference between the mean values of log incomes for men's and women's households. The mean log income differential equals 0.042 (it means that the difference between men's and women's household income is about 4.2%).
The difference between the mean log income values was decomposed into two components: the first one explaining the contribution of the different values of the models coefficients (the unexplained part), and the second one explaining the contribution of the differences of the attributes (the explained part). The unexplained effect is positive and gives us information about discrimination. The explained gap is negative, which means that the difference in the average log incomes between households is reduced by better women's characteristics.    Voivodeship heterogeneity is not limited to the size of the gap but also concerns its composition. The unexplained effect is huge for the voivodeships with the high raw differential and is small for the provinces with the low raw differential. This part of the gender income gap is nowhere identified to be negative and gives us information about discrimination. The explained gap is negative in 12 provinces (among others in Mazowieckie, Podkarpackie and Łódzkie).
The negative value of this component means that the difference in the average log incomes between men's and women's households is reduced by the better women's characteristics. In 2 voivodeships, the explained part is positive, that is, it increases the overall gap (in Warmińsko-Mazurskie and Opolskie). Our results provide novel insights into the regional dimension of the income gap. The observed characteristics play very different roles across the regions, featuring a higher explanatory power in regions with a low gap.
Since the Oaxaca-Blinder technique focuses only on average effects, we decomposed inequalities along the distribution of log incomes for men and women using the residual imputation approach (JMP-approach). The total differences between the values of log incomes have been calculated and the results are shown in Table 5. They are expressed in terms of percentiles (the symbols p5, …, p95 mean 5th, …, 95th percentile; e.g. the 25th percentile is the log income value below which 25% of observations may be found). For each voivodeship, the differences between the values of the log incomes of men's and women's households along the whole log income distribution were calculated. Then the differences were decomposed into the sum of the unexplained and explained components (the results are presented in Figures 3, 4, 5, 6).
After assessing the income gap (the raw, the explained and the unexplained gap) for all 16 provinces, they were grouped using the hierarchical clustering methods. In the first step, the grouping was carried out by four methods: single, complete, average and Ward's linkage with Euclidean distance. Then the best classification method was selected using the cophenetic correlation coefficient (the calculated values of this coefficient were: 0.633 for single, 0.585 for complete, 0.665 for the average and 0.594 for Ward's linkage). We chose the average-linkage clustering algorithm, which allowed to group voivodeships into clusters as shown in Figure 2. Group 1 consists mainly of the highly developed provinces with high GDP per capita. The Dolnośląskie and Śląskie voivodeships are highly urbanized and industrialized regions. The most important regional industries are mining, metallurgy, the power industry, car manufacturing, and engineering. These are the richest provinces in Poland, with high wages and incomes, as well as low unemployment rates. Also highly industrialized are Łódzkie (textile and power industry) and Lubuskie (automotive, wood and furniture industry, food processing, services and transport), with medium incomes. The Pomorskie Voivodeship is a region with one of the lowest percentages of people working in agriculture and one of the highest with employment in the service sector, with high wages and incomes.
Highly industrialized regions usually offer well-paid jobs for men. Therefore, in these five provinces, the total income gap is large. But the gap has a decreasing shape and is falling as we move toward the top of the income distribution. The unexplained part is positive, presenting the existing effect of discrimination on the labor market. Its share is high in the whole range of income distribution. The explained differential (the effect of characteristics) is negative, which means that the properties possessed by both people's groups decrease the inequalities.  In these provinces the total income gap is large and U-shaped (bigger at the bottom and the top of the distribution, indicating sticky floor and glass ceiling effects). For the lower income ranges, the explained effect is bigger than the unexplained effect. Its positive values mean that the worse characteristics of women's households increase the income inequalities at the bottom of the log income distribution. For higher income ranges, the unexplained effect dominates (with the exception of the Opole region).

Conclusions
The objective of the study was to compare the distribution of household income in 16 Polish voivodeships, taking into account the gender differences in family heads. We expected that employing the advanced decomposition technique will provide deeper insights into the nature of income differentials. The obtained results confirm the regional variation of the gender income gap in the researched voivodeships in Poland. Men's households have much higher incomes than women's households in industrialized regions. The largest inequalities were recorded in the most affluent regions: the Śląskie, Opolskie, Dolnośląskie, Lubuskie, Łódzkie, Pomorskie and Warmińsko-Mazurskie voivodeships. Similar results were obtained by T. Słoczyński (2012), who provided a description and an explanation of regional variation in gender wage gaps in Poland, and showed that the gap is especially large in the Śląskie Voivodeship. He found a strong gender wage discrimination in this region. Moreover, there men typically work in bigger firms than women and the distribution of women between different occupations and industries in Silesia is especially disadvantageous for their relative wages. A. Jędrzejczak (2015) believes that this kind of income inequality could be the result of the rapid economic growth in some regions that took place in the transformation period.
In order to move beyond an estimation based on mean values, we extended the decomposition procedure to different quantile points along the whole income distribution using the Residual Imputation Approach. Like J. Albrecht, A. Björklund and S. Vroman (2003) for Sweden, we have found sticky floor and glass ceiling effects in some voivodeships. Even before our study, A. Newell and M. Socha (2005) showed that many of the factors influencing wages, including gender, have a stronger impact in higher quantiles of the wage distribution. M. Rokicka and A. Ruzik (2010) found that the inequality of earnings between women and men tends to be larger at the top of the earnings distribution. The novelty of our study is that we analyzed the inequalities between household income distributions taking into account regional differences.
After assessing the raw, the explained and the unexplained gaps for all 16 NUTS2-regions, we grouped them into clusters using the hierarchical clustering method. We have shown that there is an important diversity in the size of the gender income gap and in its underlying causes across the 16 Polish provinces. Such a gap is the result of many factors. The findings of this study depend on the used data set, the number of explanatory variables and the applied method of decomposition. Future studies will address the abovementioned aspects.
The obtained results can be useful in helping social policymakers better understand the influence of various socioeconomic determinants on income inequality. This leaves room for them to implement policies that would rebalance the situation in the labor market and decrease the economic divergences between particular regions in Poland.