Survival Modelling of Repeated Events Using the Example of Changes in the Place of Employment 1

This paper concerns the issue of survival modelling in the case of repeated events. In the modelling of this type of events, attention should be paid to the existence of dependence among the analysed durations, as well as the occurrence of unobserved heterogeneity. One of the ways to include these aspects in the analysis is to use models with random effects. The primary objective of this paper is to present the application of such models to analyse changes in the place of employment. The duration of individual periods of employment for the surveyed employees was modelled. The approach used made it possible to identify factors influencing decisions on job changes, but also to assess the risk of occurrence of events such as termination of employment, and to examine the impact of unobserved heterogeneity on the results of the estimations.


Introduction
A professional career can be considered as a sequence of definite events generated by a random mechanism over an individual's life cycle (Willekens, 1999). Therefore, events such as the commencement or termination of an employment relationship may occur for a given individual many times during his or her career. In this article, the duration of individual periods of employment of the respondents was modelled; therefore the events considered were terminations of employment relationships.
The departure of employees from an organisation may be forced or voluntary (Sochacka, 2012). In the first case, the employment relationship is terminated at the initiative of the employer; in the other, it is at the employee's own initiative. In addition, an employment relationship may be terminated, for example, by agreement between the parties or by the employee's retirement. Regardless of the reasons for the termination of an employment relationship, many factors can be distinguished that may determine the time for which an employee remains at a given enterprise. These include factors describing the current socioeconomic situation of the country, the characteristics of the company and the characteristics of the employee himself or herself. In this article, attention was focused on the latter group of determinants. In the context of these factors, it is helpful to consider the results of a study by C. Tanova and B.C. Holtom (2008). According to those researchers, the decision to change one's place of employment results not only from an individual's attitude to work or real opportunities in the labour market, but can also be the result of job embeddedness.
In most current publications on modelling the duration of employment, only the last period of employment is taken into account, and in the case of many such periods, each of them is modelled separately (Giannelli, Jaenichen, Rothe, 2016;Grzenda, 2017). The purpose of the presented study was to indicate and assess those individual characteristics which had affected the moment of termination of an employment relationship on the basis of an analysis of all previous periods of employment of a given individual since age 15. Therefore, it was necessary to use models for recurring events. An analysis of a similar type of work-related events was made by B. Bieszk-Stolorz (2018). In that work, a stratified Cox regression model was used to analyse multiple episodes of the duration in the registered unemployment. This model, despite its numerous advantages, does not allow one to take into account the links between the modelled events because each of them is considered separately.
In the presented study, parametric survival models (Cox, Oakes, 1984;Miller Jr, 2011) were used to model the time to the occurrence of the event, which was the termination of an employment relationship. Models of this type enable the assessment of the influence of the vector of explanatory variables on the intensity of transitions between specific states. The selection of variables for the model is often limited by the available database; moreover, some variables that may potentially affect the risk may be unobservable. A consequence of the omission from the model of ob-servable or unobservable explanatory variables that differentiate the observations studied is the phenomenon of unobserved heterogeneity. Models with unobserved heterogeneity can be considered under two approaches: individual and group (Kleinbaum, Klein, 2006). In this paper, due to the subject of the research, attention was focused on the latter approach. Therefore, it was assumed that individuals within the same group might be similar to each other in terms of some unobserved factors. Consequently, this means that durations to the occurrence of the event being studied may be correlated within an investigated group (Morris, Christiansen, 1995).
In the analysis of survival, for modelling unobserved heterogeneity, so-called frailty models (Hougaard, 1991;1995;Wienke, 2011) are used. Models in which the random effect is treated in a group approach are more precisely called shared frailty models (Gutierrez, 2002). Models for repeating events are a special case of models for grouped data. Taking into account different ways of treating the factor expressing unobserved heterogeneity, fixed effects models and random effects models are distinguished. In this article, random effects models for repeating events (Allison, 2010) are used for modelling the duration of periods of employment in an individual's professional career.

Modelling repeated events using exponential and Weibull models
When considering a repeating event for the same unit, it can be expected that the observed times for the event may be correlated with each other. Therefore, repeatable events data modelling requires the consideration of the dependence among observations concerning the same unit. The neglect of this phenomenon may result in biased standard error estimates and in the overstatement of test statistic values. According to P. D. Allison (2010), some of the methods may limit the bias of the estimation of standard errors, but they do not eliminate the bias of parameter estimates resulting from unobserved heterogeneity. Consequently, when estimating survival models, this may lead to incorrect estimates of the hazard function.
This article considers parametric survival models (Cox, Oakes, 1984). The most popular model of this group is the exponential model. The hazard function for this model is constant over time and can be written as follows: A generalisation of this model is a model in which the logarithm of the hazard function is a linear function of time: By adopting the logarithm of ln(t) instead of t, we obtain the Weibull model, which can be written in an equivalent form: whereas for the Weibull model: Let h ij (t) denote the hazard function for the j-th event for the i-th individual at time t, i = 1, 2, …, n, j = 1, 2, …, m i , where m i is the number of events for the i-th individual. Moreover, let x ij denote the vector of explanatory variables for the i-th individual and the j-th event, i = 1, 2, …, n, j = 1, 2, …, m i , and β the parameters vector. The total number of observations is where εi is the factor by which the unobservable effect of the i-th observation is expressed, i = 1, 2, …, n. In this paper, models with random effects are considered; therefore, εi is a random variable with a given distribution. Usually for the random variable i ε the normal distribution with mean 0 and variance σ 2 is chosen, but it should be added that these models may be sensitive to the selection of the distribution for this random variable (Allison, 2010). Another commonly chosen distribution for this random variable is the gamma distribution (Morris, Christiansen, 1995;Fan, Li, 2002).
In this paper, the exponential model and the Weibull model are considered. Then ( ) where ε i is the factor by which the unobservable effect of the i-th observation is expressed, i = 1, 2, …, n. In this paper, models with random effects are considered; therefore, ε i is a random variable with a given distribution. Usually for the random variable i ε the normal distribution with mean 0 and variance σ 2 is chosen, but it should be added that these models may be sensitive to the selection of the distribution for this random variable (Allison, 2010). Another commonly chosen distribution for this random variable is the gamma distribution (Morris, Christiansen, 1995;Fan, Li, 2002).
In this paper, the exponential model and the Weibull model are consid- where ti is the survival time, and νi denotes the censor variables, where νi = 0 if the unit is rightcensored and νi = 1 otherwise, for the unit for which the event occurred, for i = 1, 2, …, n. Thus, on taking α = 0 , the exponential model is obtained. More on various types of survival models with random effects can be found in (Wienke, 2011). A literature review on other models used to model multiple episodes of a professional career can be found in the monograph (Landmesser, 2013). (8) then for the Weibull model the log-likelihood for the i-th individual is given by the formula: where t i is the survival time, and ν i denotes the censor variables, where ν i = 0 if the unit is right-censored and ν i = 1 otherwise, for the unit for which the event occurred, for i = 1, 2, …, n. Thus, on taking α = 0, the exponential model is obtained. More on various types of survival models with random effects can be found in (Wienke, 2011). A literature review on other models used to model multiple episodes of a professional career can be found in the monograph (Landmesser, 2013).

The scope of the study
The study used a data set derived from the panel survey entitled Generations and Gender Survey (GGS) for Poland, conducted as part of the Generations and Gender Program (GGP). The data come from the second half of 2014; in addition, the values of some features were supplemented based on earlier research carried out in 2010-2011. The GGS survey is conducted on a random sample of respondents aged 18-79. In the presented study, in the entire data set, individuals who were aged 18-44 at the time of the study were identified, thus obtaining 2880 observations. Then, for each individual, all of his or her periods of employment were distinguished based on information contained in the variables related to the commencement or termination of work at a given place and the transition to employment elsewhere. In addition, those periods of employment were assigned values of other characteristics of the respondents subject to changes over time, if such information was included in the data. The study considered only work undertaken after the age of 15. In this way, 6298 observations were obtained and modelled. The dependent variable was the time of the individual periods of employment in months. 3924 events were observed for all individuals considered, i.e. exits from working status, with a maximum of 9 employment periods per individual. At the time of the research, 2374 persons were in employment, while 506 were unemployed. Based on that information, a censor variable was created for modelling purposes, which takes the value 1 in the case of occurrence of event, i.e. the termination of an employment relationship, and 0 if the respondent had a job at the time of the research.
The GGS data used in the modelling do not provide information as to the state into which the individual passed after the termination of an employment relationship, but their advantage is the possibility of establishing the value of other socioeconomic characteristics for a specific point in the respondent's professional career. Table 1 presents a set of potential explanatory variables included in the study. The structure of all variables listed in Table 1 was given at the time of the research, while the variables marked with (*) were included in the model as variables changing over time.

Model estimation
The Weibull model was used to model the duration of individual employment periods due to its desirable properties in this type of analysis (Allison, 2010;Landmesser, 2013). In the first stage of the research, this model was constructed with all of the explanatory variables presented in Table 1 in order to examine their impact on the duration of the distinguished periods of employment. Some of the variables considered were statistically insignificant. The results of estimating the Weibull model with variables for which at least one level turned out to be statistically significant are presented in Table 2. Based on the results obtained, it can be concluded that the variance of the random component is statistically significant. This means that in the modelling of individual employment periods it is advisable to use models with random effects. The evaluation of the shape parameter in the Weibull model is -0.0256, which means that a 1% increase in the time of remaining in employment was associated with approximately a 0.03% decrease in the risk of termination of the employment relationship. This result is somewhat surprising, as one might rather expect that a longer period in employment would increase the likelihood of termination. Therefore, in order to analyse that result more accurately, in the last stage of the research, the hazard function for selected individuals was determined. In addition, based on the obtained p-value for the alpha parameter, it can be concluded that this parameter is significant at the level of 0.1, but not at the level of 0.05. This means that the Weibull model considered in that analysis could be replaced with the exponential model, which was confirmed by the likelihood ratio test. For that test, the obtained value of the test statistic was 2, while the critical value at the significance level of 0.05 was 3.841. Therefore, in the next stage of the research, the exponential model was estimated for those data. Due to the fact that the previously used Weibull model is a generalisation of the exponential model, the obtained values of estimates differed only slightly from those in Table 2; they are therefore omitted from the presentation. Based on the results of the Weibull model parameters given in Table 2, it can be concluded that the time to termination of employment was shorter in the case of persons who worked in the private sector than in the case of those employed in the public sector. However, for persons performing other types of work, the time to termination was longer than for those employed in the public sector. People with a lower level of education than a master's degree had a shorter time to termination of employment. Persons with a child had a longer time to termination than those without children. For married persons, the time to termination was shorter than for single persons. The time to termination was longer for men than for women. People aged between 25 and 34 and between 35 and 44 also had longer times to termination of employment than those in the youngest age group. It was also found that people living in cities experienced shorter times to termination of employment than rural residents.
Due to the high importance in Poland of the so-called traditional social roles of women and men (Kotowska, Sztanderska, Wóycicka, 2007), in the next stage of the study, models for women and men were estimated separately. The results of estimation of the Weibull model for women are presented in Table 3. Based on the results (Table 3), it can be concluded that in the case of women, the risk of termination of employment was constant over time; therefore the appropriate model for estimating the examined event was the exponential model. In addition, variables describing the class of place of residence at the time of the research turned out to be statistically insignificant; therefore those variables were omitted in the exponential model, for which the estimation results are presented in Table 4. The values of the parameter estimates in the two models for women are very similar to each other. Due to the fact that in the exponential model the hazard function is constant and the average time to the occurrence of the event is the inverse of the hazard function, the results of the latter model were subjected to detailed interpretation (Table 4). It was found that the average time to termination of employment for women who worked in the private sector was 25.54% shorter than in the case of women working in the public sector. However, for women performing other types of work, the average time to termination was more than twice as long as that for women employed in the public sector. Women with a lower level of education than a master's degree had a shorter average duration of employment: by 44.09% in the case of women with bachelor's, engineering, post-secondary or secondary vocational education, by 53.32% in the case of women with general secondary education, by 57.03% in the case of women with basic vocational education, and by 47.73% in the case of the least-educated women. Women with a child had an average duration of employment more than six times longer than childless women, while for married women the duration was 21.81% shorter than for unmarried women. Compared with women from the youngest age group, women aged from 25 to 34 and from 35 to 44 years had longer average durations of employment: by 50.70% and 72.93% respectively. Moreover, in both of the models for women ( Table 3, Table 4), it was found that the variance expressing the difference between the considered individuals was statistically significant. In the case of men, the Weibull model turned out to be the appropriate model describing the time to the termination of an employment relationship. The estimation results for this model are presented in Table 5. As in all of the previous models, the variance of the random component turned out to be statistically significant. Based on the estimation of the alpha parameter, it can be concluded that a 1% increase in the duration of employment was associated with a drop of approximately 0.05% in the risk of termination of an employment relationship. The direction of the impact of the considered variables on the moment of the termination of the employment relationship in the model for men is the same as in the model for the entire surveyed population. The largest differences in the values of parameter estimates were obtained in the case of variables describing the education level and place of residence. Comparing the results obtained using the Weibull model for men with those obtained using the same model for women, the direction of the impact of individual factors on the duration of the employment relationship is found to be the same (Table 3, Table 5). However, the scale of this impact is different; the largest differences were observed in the variables describing the education level and family situation of the respondent. The reduction in the time to the termination in the case of respondents with education below master's degree level, compared with the best-educated group, was greater in the case of women than men. Men with children, more so than women with children, experienced longer times to the termination of employment than those without children. On the other hand, married men experienced accelerated termination of employment compared with unmarried men, more so than married women compared with unmarried women.
In the next stage of the study, hazard functions were estimated separately for women and men characterised by the following features: working in the private sector; having bachelor's, engineering, post-secondary or secondary vocational education; having a child; being married; being aged 25 to 34 years; and for men, living in a city with at least 100 thousand residents (Figure 1). In the case of women, the results obtained in the exponential model were used, hence the hazard function is constant throughout the analysed period. However, in the case of men, the hazard function decreases for about 50 months, after which it is also constant. It may also be seen that the risk of termination of employment in the case of women was about twice as high as in the case of men.

Conclusions
This study has focused on modelling events that may occur more than once for a given unit in the investigated period. The events considered are terminations of employment relationships, which may occur several times during the professional career of an individual. The modelling of recurring events, due to the possibility of correlation between observed durations for a given unit, requires the use of appropriate methods of analysis. This article uses parametric survival models with random effects. The applied approach allowed unobservable heterogeneity to be taken into account in the modelling. The neglect of this phenomenon may lead to incorrect evaluation of the model parameters, which in the survival model results in incorrect assessment of the impact of the examined factors on the intensity of transitions between the examined states. In all of the considered models, the variance expressing the difference between the surveyed individuals turned out to be statistically significant. This means that the hypothesis of lack of correlation between the times to recurring events for a given individual should be rejected. Therefore, it can be concluded that to analyse the durations of periods of employ-ment in individuals' professional careers, the models used should enable unobservable heterogeneity to be taken into account. As a result of the analysis, it has been found that there are still differences in the labour market in the context of gender (Sztanderska, 2005;Kotowska, Sztanderska, Wóycicka, 2007). Nonetheless, for most of the considered variables, their impact on the risk of transition from employment to unemployment was similar for women and men. The comparative analysis showed that with the same set of characteristics, women were almost twice as likely as men to be at risk of job loss. It is significant that in the case of women, their higher educational status lowers the risk of termination of employment to a greater degree than in the case of men.
Considering the results obtained in the context of the family situation, it was found that, regardless of sex, people who were married before the beginning of employment were likely to see the employment relationship terminated sooner than unmarried persons. This may be because such persons changed their jobs due to financial considerations or the need to balance professional and family duties. As regards the impact of having a child, the results obtained here are only partially consistent with those reported previously in the literature. According to I. Kotowska, U. Sztanderska and I. Wóycicka (2007), having a child has a positive influence on employment in the case of men, while in the case of women, the opposite relationship to that obtained in the presented study was observed. However, it should be taken into account that the variable included in this study describes having a child before a given period of employment or the birth of a child during that period. In the first case, women may have chosen in advance a job that allowed them to reconcile their professional and family obligations, while in the other case, according to employment legislation, employers are obliged to allow women to return to work following maternity or parental leave. In addition, according to other slightly earlier studies, if the right conditions are met, having a child has a positive effect on employment in the case of women (Balbo, Billari, Mills, 2013).
Attention should also be paid to the situation of young people in the labour market in Poland. According to a report by the Central Statistical Office (CSO, 2016), young people up to 24 years of age are subject to the highest unemployment rate. The results obtained in this research also indicate that this was the group of people with the highest risk of employment termination. This may be related to the fact that young people often take temporary work and use their acquired qualifications to find a job that better suits their professional expectations.
Further information on the situation in the labour market in Poland is provided by the results obtained for the variable describing the type of work performed. It has been found that the public sector is still more stable in terms of employment than the private sector. However, the highest employment stability was obtained for people performing other types of work, such as the self-employed.