The Impact of Gender on Unemployment: Cross‐country and Within‐country Analysis of the European Labour Markets during Economic Recession

This paper investigates the impact of gender on the individual probability of being un‐ employed and makes a cross‐country comparison across 13 European countries during the Europe‐ an recession. Applying a general logit model for each country and capital, whilst controlling for the year, as well as for individual and regional characteristics, the probability of unemployment was es‐ timated using individual labour force data from 2011 to 2014. Cook’s distance is used to examine the differences between labour markets of capital regions (or cities) and non‐capital regions. Using the size of Cook’s distance, models are calibrated, and models which include the degree of urbani‐ zation and occupation type are evaluated. The results are presented in the form of a spatial map and show that gender affects the probability of unemployment in the majority of the analysed countries. Overall, the effect is lower in capital than in non‐capital regions.


Introduction
Gender equality is currently an important and frequently discussed topic. The European Union has defined targets to achieve equality in labour market participation in the EU and has established a roadmap for increased participation of women in the workforce (European Commission, 2013). Economic independence is a prerequisite for both women and men to be in control of their lives and is said to be a sign of a developed society.
This article compares the impact of gender on the probability of unemployment for 13 European countries, concentrating on the time period after the crisis (2011)(2012)(2013)(2014) when the majority of European economies experienced a recession. The gender impact is most likely caused by the uncertainty in the labour markets, along with a higher unemployment rate in the recession period, with female workers enduring a higher negative impact.
There are differences between capital cities and other cities in a country Capitals are often more heterogeneous than other types of cities, they host the government, they are likely to be national industrial, cultural and commercial centres with concentration of headquarters of companies. They tend to be economically stronger, for example, London has a much higher GDP per capita than the rest of the UK. Eurofound (2020) recently pointed out that: "in Europe, people living in the capital city generally have a better quality of life than people living in other parts of a country. Residents of the capital city also feel higher life satisfaction than people living outside the capital." Capitals are examined separately in Musterd, Marciaczak, van Ham and Tammaru (2016), who investigates socioeconomic differences looking only at European capitals.
The main methodological contribution of this paper is the separation of capital labour markets based on the values of Cook's statistic, which we also use as the evaluation metric for model calibration. Lastly, the general logistic model is applied, controlling for macroeconomic regional characteristics, such as GDP, population density, and degree of area urbanization, as well as individual characteristics such as education, age, gender, marital status, nationality and the occupation specialization of an individual. Theoretical Background and Model Definition.
The applied model is constructed taking into account that our dependent variable is dichotomous, taking the value 1 when an individual is unemployed and 0 when employed and the explanatory variables are either continuous or dummy variables for categorical predictors. The basic general logistic regression model (1) is used to regress probability of unemployment π i for individual i given an explanatory variable x i and coefficients β 0 and β 1 : www.czasopisma.uni.lodz.pl/foe/ FOE 6(351) 2020 The Impact of Gender on Unemployment… 83 The model may be rewritten in the form of (2): where β 1 can be interpreted as the impact of 1 unit change in x i on the log odds ra- for a continuous variable and as an impact of having a characteristic x i (ie x i = 1) on the log odds ratio for a dummy variable.
The general logistic regression model (1) is modified by increasing the number of explanatory variables. The probability of being unemployed π i for an individual i is assumed to be dependent on these exogenous variables: 1) 1 s i x a first difference of regional GDP per capital standardised using (8); 2) 2 s i x a first difference of regional population density standardised using (8); 3) x 3i and x 4i are dummies for a degree of area urbanization: a) densely populated area (the base), b) intermediate area (X 3i ), c) thinly populated area (X 4i ); 4) x 5i to x 7i are year dummies for analyzed time period 2011-2014: year 2011 is the base, x 5i is the dummy variable with the value 1 for the survey year 2012, x 6i is the dummy variable with the value 1 for the survey year 2013, x 7i is the dummy variable with the value 1 for the survey year 2014; 5) x 8i a dummy variable immig, which takes value 1 for immigrant and 0 for native; 6) x 9i a gender dummy variable female, with 1 for female and 0 for male; 7) x 10i a marital status variable married, with 1 for a married individual and 0 otherwise; 8) x 11i to x 14i are dummies for 5 age groups: up to 29 years (the base), individuals age 30 to 39 (x 11i ), individuals age 40 to 49 (x 12i ), individuals age 50 to 59 (x 13i ), and individuals age 60 to 66 (x 14i ); 9) x 15i to x 16i are dummies for highest educational attainment level, with three categories: primary (the base), secondary (x 15i ) and university (x 16i ); 10) x 17i to x 24i are dummies for profession of an individual as an occupation group by current employment or last employment before becoming unemployed (9 groups by NACE classification 2 with group 1 as the base).
FOE 6(351) 2020 www.czasopisma.uni.lodz.pl/foe/ The full regression model is: where: The estimation of parameters is done using the Maximum Likelihood method, where the function of the Lagrangian is written in equation 5 (for more see Dobson, 1990or Wooldridge, 2002: Each model for each country/capital is tested for multicollinearity of explanatory variables using the variance inflation factor (VIF). If any terms in a linear model have more than 1 df (as is the case for the dummies for categorical variables), then generalized variance-inflation factors (Fox, Monette, 1992) are calculated. The generalized VIFs are invariant with respect to the coding of the terms in the model. To adjust for the dimensions of dummy variables GVIFs are scaled: where df are degrees of freedom associated with the term. We conclude that there is no multicollinearity problem in our models, which could affect regression results significantly. Models were also evaluated by calculating diagnostics for generalized linear models (for more see Davison, Snell, 1991). Cook's distance D i is calculated using (7) by removing point i from regression and calculating ( ) j Y i the new fitted value from regression without the removed point, where p is a number of regression coefficients and 2 σ is estimated variance from the fitted model including all observations. The macroeconomic variables x 1 and x 2 are downloaded from the Eurostat webpage for each NUTS region and year of analysis and merged with information about individual location. Based on the theory of Ocun's law, which defines the inverse relationship between economic growth and unemployment rate (for more see e.g. Knotek, 2007), we expect that the estimated coefficient will be negative.
The statistics give information on how much all the values in the regression model change when observation i is removed. For each model two plots are displayed: 1) Cook's distance, plotted against the standardized leverages, 2) Cook's distance plotted against case number, enabling us to find which observations are influential. The plots help to identify influential outliers and are used for a model evaluation. Particularly here, we consider the situations where variables defining degree of urbanization and occupation are included or excluded in the model and the situations where capitals are estimated separately from the non-capital regions.
Differences in labour mechanism in capitals and other parts of countries is analysed by a two-step evaluation process: first, we include a dummy variable for capital (or year and capital) in each country estimation and note the significance of its coefficient; secondly we examine the Cook's distance to identify outliers to see whether individuals living in the capital have notable influence on model calibration. When there is evidence of difference we split data into two data sets and estimate capitals separately from the rest of the country. The size of the Cook's distance is also used to evaluate models including variables for degree of urbanization and occupations. These are included in the final models when their presence decreases Cook's distance and improves the predictability of the model. If this is not the case, they are excluded from the regression model.
To increase comparability of the β j coefficients as measures of influence for continuous variables with different magnitudes, we use a standardization method. A continuous variable x ji is re-scaled using (8) This detail gives us geographic information about an individual's location and defines regions for which the expected probabilities can be computed. In our analysis we combine three NUTS levels using: 1) NUTS 0 level: Netherlands, 2) NUTS 1 level: Austria, Germany, United Kingdom, 3) NUTS 2 level: Czech Republic, Italy, France, Spain, Belgium, Finland, Sweden, Ireland, Slovakia. Total numbers: 13 countries = 124 regions (islands and non-European parts are excluded).
The regional economic characteristics GDP and population density are downloaded from the Eurostat public online database (Eurostat, 2017). The regional economic statistics are merged with LFS data by the region of an individual's location.

Empirical results
The general logistic regression model in its variations was used to estimate the impact of gender on the probability of being unemployed. The estimated parameters of the whole model are displayed in the Appendix.
The probabilities of being unemployed for males and females are computed using the equation below with the other predictor variables in their standardised values, continuous variables set to their mean levels and dummies set equal to 0.
The estimated parameters 1 i β , representing the estimated coefficient for GDP, 2 i β , the estimated coefficient for the population density, and 9 i β , the estimated coefficient for gender from (3) and (4), are used to calculate the probability of being unemployed Prob(y j = 1) for females in region j as a logistic function, where 1, , GDP j x is the mean of the standardised regional first differences of the GDP in Euro per habitat between 2011 and 2014 for region j, 1, , is the mean of the standardised regional first differences of population density between 2011 and 2014 for region j. The same probability was calculated for males with the variable x 9i = 0. Fi-nally, the spatial map of differences in probabilities of unemployment for males and females is presented in Figure 1.
The map shows the presence of spatial patterns within regions as well as between some neighbouring countries. The within-country-region similarities were expected, given the common economical and legislative environments, which were represented by the GDP and population density. Overall, the capital regions differ from non-capital regions; in the former, the estimated differences are lower, or the gender impact is insignificant.
The expected differences in labour markets between capitals and the non-capital regions of a country were also supported by differences found in the rest of the estimated parameters. Immigrants differ less from natives in most capitals in comparison to the rest of the country. This suggests that integration works better in big cities/capitals, which tend to have more opportunities for international workers. Age groups, in general, do not have statistically significant parameters in capital regions. With respect to regional characteristics of economic evolution, GDP per habitat has, in the majority of the models, a significant negative impact on the prob-ability of being unemployed in European regions. The results are in line with the theory of Ocun's law. Population density has mostly a negative impact on the probability of being unemployed and is not significant for some capitals (London, Hamburg and Bremen, Belgium region 31, and Prague), but it is positive for the Netherlands and Finland. The degree of urbanization was included in the model for all countries except for Spain, the Czech Republic and the UK, and it was found to have a decreasing impact. Educational attainment clearly decreases the probability of being unemployed in all regression models. This impact increases with the degree of education, supporting the theory that education improves individuals' positions in the labour market.
The main limitation of the study is data. Survey data in general underrepresent some groups of individuals. Teperova (Tepperova, Zouhar, Wilksch, 2016) mentioned undersupplying of immigrants which has a minor impact on this study. The LFS data does not include consistent information needed for all countries, in order to have comparative data we analyse the representative sample of 13 European countries.

Conclusion
In this paper, the general logistic regression model is applied on individual data -about 5 million economically active individuals, in 13 European countries -to estimate the probability of unemployment while controlling for individual and regional economic characteristics during the economic recession. The estimations were done at country level. Most of the capitals were estimated separately, based on the value of Cook's distance. The map of differences between regional probabilities of unemployment, between males and females showed that there are spatial dependencies between regions inside countries and between some neighbouring countries.
The estimated results suggest that gender does play a role in European labour markets, however the impact is generally low. For some labour markets, the probability of being unemployed is estimated to be higher for females, while for other markets, the opposite impact was estimated. The impact of gender on the probability of being unemployed was lower, or insignificant, for capitals in comparison to non-capitals.
The estimated parameters, in general, differ for capital and non-capital regions, indicating that different labour market mechanisms are at play. The models show that individual characteristics have a lower impact in capitals, and that married individuals are less likely to be unemployed. Lastly, the unemployment probability decreases as the degree of education increases, and is higher for immigrants in the majority of the European countries investigated. Tables of results of empirical analysis Tables set 1: Estimation results of probability of being unemployed, applying the general logit model (defined in (3) and (4)