Stratified Cox Model with Interactions in Analysis of Recurrent Events

The purpose of this paper is the assessment of relative intensity of exit from registered un‐ employment by means of the analysis of recurrent survival episodes and the comparison of these re‐ sults with the results obtained for an individual episode. The stratified Cox model with interactions was used. Statistical data collected by labour offices indicate that a large fraction of the unemployed persons is registered multiple times. However, many of them resign from the mediation of labour of‐ fices and are subsequently removed from the register. In the article, the intensities of de‐registration due to various causes for men and women were compared. The study data came from the database of personal details of people registered by the Poviat Labour Office in Szczecin in 2013. The observa‐ tion covered the records of their registration until the end of 2014. Gender of the unemployed persons influenced the intensity of de‐registrations in the first episodes, partially in the second and third ones, due to various causes, such as finding a job or removal from the register, whereas it did not influence the intensity of de‐registrations in the fourth and subsequent episodes. As for the other causes in the subsequent episodes, the differences were also not statistically significant. The proposed analysis may be important for implementing a good policy in the labour market. The identification of persons that resign from the mediation of the labour office is as interesting as the identification of these persons who find a job.


Introduction
The survival analysis focuses on how long an individual survives in a given state until the moment when a specific event occurs. It is often the case that we examine processes in the course of which an individual enters the pre-defined state several times. The recurrent event processes are defined as processes that generate specific events more than once (Cook, Lawless, 2007).
They can be analysed by means of selected survival analysis methods, which are used in technical sciences to examine assembly line stoppage or to detect and remediate software bugs. In medicine, the duration to relapse is analysed (Sagara et al., 2014), while in socio-economic sciences we can study the length of subsequent episodes of entering and leaving poverty (Sączewska-Piotrowska, 2015), the time between recurrent warranty or insurance claims or the duration of unemployment (Gałecka-Burdziak, 2016;Gałecka-Burdziak, Góra, 2017). In such studies the analysed random variable is the time to event. In earlier works on textile industry, the random variable was the yarn length, while the event -the occurrence of a yarn defect. Other examples include vehicle mileage until warranty repair or a number of production cycles (Cook, Lawless, 2007).
The purpose of this paper is the assessment of relative intensity of exit from registered unemployment by means of the analysis of recurrent survival episodes and the comparison of these results with the results obtained for an individual episode. Using the stratified Cox regression model with interactions, the relative intensity of ceasing the use of services provided by the labour office in particular episodes by women with respect to men was compared. The time from the moment of registration to de-registration was observed. The de-registration from the labour office could have happened due to various causes: taking up a job, removal from the register or other causes. The study is an important contribution to the process of creating a labour market policy. It allows policy-makers to select from the pool of the registered unemployed a group of individuals who should be covered by activation programmes.

Methodology of research
Selected survival analysis methods for recurrent events and discontinuous risk intervals were used in the research. In medicine, when we analyse the time to a relapse of a chronic disease (e.g.: asthma, epileptic seizures or osteoporotic fractures), the term of continuous risk intervals is used (Twisk, Smidt, de Vente, 2005). During the investigation of the duration of registered unemployment, the fact that recurrent episodes are separated by periods of de-registration from  (Figure 1), thus creating discontinuous risk intervals (Guo, Gill, Allore, 2008;Tan, 2014: 60-77), should be considered. On that account, in this study, the conditional approach was used (Hosmer, Lemeshow, 1999: 308-311;Machin, Cheung, Parmar, 2006: 247;Aalen, Borgan, Gjessing, 2008: 473). The time to the next event was determined using the notion of a time gap (Prentice, Williams, Peterson, 1981;Jiang, Landers, Rhoads, 2006;Bijwaard, Franses, Paap, 2006), i.e. -the beginning of each event was restarted to zero  The actual study was preceded by the estimation of the median duration. It is the survival time at which the survival function is equal to 0.5. The survival function is defined as follows: where: T -the event duration, F(t) -the cumulative distribution function of the random variable T. The most frequently used estimator of the survival function is the Kaplan-Meier estimator (Kaplan, Meier, 1958): where: d j -the number of events at the moment t j , n j -the number of individuals at risk by the moment t j .
In the first stage of the analysis, the author estimated the intensity of de-registration from labour office lists due to any cause and the intensity of de-registration due to three main causes for a single episode. To this end, the Cox hazards model was applied, where (Bieszk-Stolorz, Markowicz, 2012): where: t -time, X = [X 1 , X 2 , …, X n ] -the vector of explanatory variables, h 0 (t) -the baseline hazard.
The second stage of the study included the analysis of recurrent episodes. Here the stratified Cox proportional hazards model with interactions could be used (Kleinbaum, Klein, 2005: 352): where: t -time, k -the number of strata, X = [X 1 , X 2 , …, X n ] -the vector of explanatory variables, g -the number of stratum, h 0g (t) -the baseline hazard for the stratum g, D j -the dummy dichotomous variable determining the episode number. The alternative version of the model (4): where: t -time, k -the number of strata, X = [X 1 , X 2 , …, X n ] -the vector of explanatory variables, g -the number of stratum, h * 0g (t) -the baseline hazard for the stratum g, D j -the dummy dichotomous variable determining the episode number.
The intensity of unemployment leaving with respect to the de-registered person's gender was analysed. The dichotomous variable X was 1 for women and 0 for men. The strata (g) were defined as follows: 1 -the first episode, 2 -the second episode, 3 -the third episode, 4 -the fourth and subsequent episode. Therefore the model (4) took the following form: (6) where: Using the estimated parameters of the model (5), we could determine the hazard ratio (HR j ) for the j-th episode: While the alternative model (5) took the following form: where: 1 -th episode 0 not -th episode In this case, the hazard ratio (HR j ) for the j-th episode, which represents the intensity of women's unemployment leaving against men's, is given by: The calculations were made using the alternative model (10).

The data used in the research
The study used individual data of unemployed people registered for the first time  -Stolorz, 2017a;2017b), in this study, the causes were categorised into three groups: taking up a job (Job), removal from the lists due to causes related to the unemployed person (Removal), and the remaining causes (Other). The detailed description of these categories is shown in Table 1. The category Job contains such causes for de-registering as taking up a subsidised and non-subsidised job or starting an entrepreneurial activity. Removal is understood as the de-registration of an unemployed person from the register because of their absence in the office within a specified time as well as their refusal to accept a job offer. The category Other includes de-registering due to the granting of disability pension, allowance or old-age-pension, the individual's moving abroad or death. Every individual registered by the labour office has their own record of registration. Such records include several episodes, i.e. the subsequent periods of registered unemployment. After a preliminary analysis of the number of episodes in the registration records, a decision was made to divide them into four groups: the records with one, two, three, four or more episodes. The selection of the last category resulted from a small number of people with at least four unemployment episodes. The analysis covered 5,418 records, including 2,644 records of unemployed women. A vast majority (4,100) contained one episode of unemployment, while 1,078 contained two episodes ( Table 2). The first episodes usually ended with taking up a job followed by de-registration (for both men and women). In the case of the second episodes, the pattern was similar in the group of females, while in the group of males de-registration predominantly preceded taking up a job (Table 3).   There were two types of right-censored observations in the research. The first type of censoring was when at the end of the study the individual was still in the register. In the analysis of episodes due to the de-registration causes, there was one more censoring type -de-registering because of causes other than the analysed ones.

Analysis of relative intensity of de-registering from the labour office
The actual study was preceded by the analysis of median duration of episodes (Table 4). Note that the 50 th percentile (median) for the cumulative survival function is usually not the same as the point in time up to which 50% of the sample survived. This would only be the case if there were no censored observations prior to this time (Sokołowski, 2010). In some cases, the estimation of median is impossible. Such a situation takes place when the intensity of exit from unemployment is low. In the case of other causes of de-registration in the analysed period (24 months), less than 50% of persons left the unemployment. It referred to all unemployed persons -both men and women. That is why this cause was not presented in Table 4. It also happened in several subgroups of finding a job and removal (appropriate situations in Table 4 were marked by the hyphen). In these cases, we can only state that the median duration is higher than 24 months. In the case of any cause of de-registration, the median duration was longer for women than for men (4.47 vs 3.35 months, respectively). The median time to finding a job was lower for women. It results from the fact that women found jobs sooner than men.
Men were more intensively removed from the register because the median duration for men was lower than for women with the exception of the fourth episode.
In the first stage of the study, the parameters for the Cox hazards models (1) were estimated for single episodes (Table 5). Women were de-registered less intensively than men due to any cause (by 16%) and due to removal (by 45%), but they took up a job with higher intensity (by 10%). Because the β parameter is not significant, the intensities of de-registration due to other causes for women and men were not significantly different. The second stage of the analysis, i.e. the estimation of parameters of the stratified Cox model with interactions (8), allowed the assessment of the recurrent episodes (Table 6). During the first and second episode, women were leaving the register with lower intensity than men (by 15% and 19% respectively). They were also taking up a job with higher intensity during the first episode (by 9%) and were removed from the register with lower intensity during the first (by 44%), second (by 43%) and third (by 49%) episode. In other cases, de-registration intensity between men and women was not significantly different.

Conclusions
Applied methods of survival analysis allowed to assess the intensity of unemployment exit with respect to gender of the unemployed person. The examination of the effect of gender on the intensity of unemployment exit in the case of a single episode was possible because of using the Cox regression model. In this case, the intensities of de-registrations from the labour office of men and women due to any cause, finding a job and removal were significantly different. The stratified Cox regression model with interactions was used to evaluate the relative intensity of de-registrations from the labour office during the subsequent episodes, i.e. the subsequent periods of registration in the labour office. In the case of the first episodes, the intensities of de-registrations of men and women were significantly different for any cause, finding a job and removal. These differences were significantly different for the second episodes in the case of any cause and removal, while for the third episodes -only for removal from the register. The intensities for women and men were not significantly different for the fourth and subsequent episodes. As for the other causes in the subsequent episodes, the differences were also not statistically significant. The proposed analysis may be important for implementing a good policy in the labour market. Other researches very often focus only on unemployed persons finding a job. The data published by the Central Statistical Office indicate that there is a considerable group of the unemployed persons that resign from the mediation of labour offices without giving a reason. It would be interesting to extend the study and examine the impact of the unemployed Poles' other reasons for de-registration. Unfortunately, the restricted access to individual data is a serious research limitation.