Prediction of Banks Distress – Regional Differences and Macroeconomic Conditions

In this study we focus on distress events of European banks over the period of 1990–2015, using unbalanced panel of 3,691 banks. We identify 132 distress events, which include actual bank‐ ruptcies as well as bailout cases. We apply CAMEL‐like bank‐level variables and control macroeconomic variables (GDP, inflation, unemployment rate). The analysis is based on traditional logistic regression and k‐means clustering. We find, that the probability of distress is connected with macroeconomic conditions via regional grouping (clustering). Bank‐level variables that were stable predictors of dis‐ tress from 1 to 4 years prior to event are equity to total assets ratio (leverage) and loans to funding (li‐ quidity). From macroeconomic factors, the GDP growth is a reasonable variable, however with differ‐ entiated impact: for 1 year distance high distress probability is connected with low GDP growth, but for 2, 3 and 4 year distance: high distress probability is conversely connected with high GDP growth. This shows the changing role of macroeconomic environment and indicates the potential impact of favorable macroeconomic conditions on building‐up systemic problems in the banking sector.


Introduction
The prediction of bank failures is found to be a challenge due to the limited number of these failures. The research on bank failures was initiated by Sinkey Jr (1975) for American banks and expanded mostly for the US. In Europe the number of studies is much more limited, however the wave of bank bailouts that swept across Europe during the global financial crisis (GFC), helped study bank problems in Europe (e.g., Poghosyan, Cihak, 2011;Altman, Cizel, Rijken, 2014;Betz et al., 2014;Iwanicz-Drozdowska, Laitinen, Suvas, 2018).
Due to the fact that the number of actual bank failures is low, cases of bailouts and forced mergers have been included in bank failure studies and called distress events (e.g. Arena, 2008;Betz et al., 2014;Altman, Cizel, Rijken, 2014). We follow this approach, which is now well-established in literature. However, still the number of distress events in Europe is much lower than in the US 1 , which makes bank bankruptcy prediction a difficult task.
This study expands an already existing research by Iwanicz-Drozdowska, Laitinen and Suvas (2018), using the same database and accounting for macroeconomic factors and heterogeneity of countries. The goal of this paper is to identify the link between macroeconomic conditions and bank distress while accounting for heterogeneity of countries due to the level of the risk of bank distress. Addressing the heterogeneity of countries and advanced methods adds to the research in the field of banks distress prediction.
The remaining part of the paper is organized as follows. Section 2 provides a review of the literature and presents research hypothesis. Section 3 presents the data and methodology. Section 4 discusses the empirical results, and Section 5 addresses the policy implications and presents conclusions.

Literature review and research hypothesis
The analyses of bank financial difficulties have been conducted on macro-and microeconomic levels. On macroeconomic level, the research was focused on the detection of early warning signs of a banking crisis and was based to a large extent on macroeconomic and industry-level data. This stream of research has developed especially after the outbreak of the South Asian crisis in mid 1990s and again after 2008 (Drehmann, Juselius, 2014). On microeconomic level, the research was targeted at detecting banking failures, mostly for the US banking sector (e.g., Sinkey Jr, 1975;Peek, Rosengren, 1996;Wheelock, Wilson, 2000;Kolari et al., 1 For 2000-2011, Hambusch and Shaffer (2016 registered 441 bank failures in the US, which is many more than distress events occurring among the European banks in the same period. 2002; Cole, White, 2012;Shaffer, 2012;Cox, Wang, 2014;López Iturriaga, Sanz, 2015). There is a limited number of studies of bank failures from other countries due to the fact that the number of failures is rather low. In this stream of research the bank-level data has been used to a large extent. Bank specific traits included in the analysis were based on the CAMEL 2 approach. CAMEL has been one of the most popular approaches to assess the banks' financial position, both in research and supervisory practice (e.g., Lopez, 1999), because it covers the most important aspects of bank risks and performance. Against this background, we expand microeconomic analysis of bank problems using bank and macroeconomic characteristics. Therefore, our review of literature is selective and focuses on the studies which accounted for the impact of macroeconomic environment on bank distress. To the best of our knowledge, there are two studies fulfilling these criteria: Kapinos and Mitnik (2016) for the US banks and Betz et al. (2014) for European banks. Kapinos and Mitnik (2016) conducted stress tests for medium and large banks showing how resilient individual banks and banking sector are to macroeconomic shocks. They applied principal component analysis (PCA) to macroeconomic factors and least absolute shrinkage and selection operator (LASSO) combined with PCA for bank-level characteristics. This study accounted for heterogeneity of banks. They identified macroeconomic drivers of bank-level variables and these bank-level factors that explain bank heterogeneity in reaction to macroeconomic shocks. Among macroeconomic variables appeared: VIX level, BBB spread, DJIA growth, growth of housing prices (one of the leading factor for the GFC), mortgage rates, GDP growth, unemployment rate and inflation rate. Bank-level traits included growth rates of assets and loans, leverage (equity to assets), tier 1 to assets, deposits to assets, trading assets to assets, non-performing loans (NPL), consumer to loans (and to assets), and real estate loans to loans (and to assets). From the CAMEL perspective two components were omitted: earnings and management (proxied in research by cost to income ratio, but it actually represents earnings). They modelled two variables, i.e. the pre-provision net revenue and net charge-offs on loans and leases, both representing earnings. The goal of Kapinos and Mitnik (2016) was to run stress tests and therefore we cannot follow their methodology in this study, but we apply similar list of macroeconomic and bank-level variables adjusted to the availability of data and specifics of European accounting. Betz et al. (2014) based on quarterly data of European banks from 2000 to mid-2013 used CAMEL-like bank-level variables and macroeconomic and banking sector variables. They focus on banks with total assets of at least 1 billion EUR (546 banks in the sample and 28,832 bank-quarter observations). Macroeconomic var-2 This methodology requires knowledge about the bank's capital adequacy (C), asset quality (A), management (M), earnings (E) and liquidity (L).
iables were similar to Kapinos and Mitnik (2016), but expanded by government debt to GDP, private sector credit flow to GDP and international investment position to GDP. The list of banking sector variables included e.g., total assets to GDP, debt to equity, loans to deposits, mortgages to loans, non-core liabilities. These variables duplicated the bank-level variables on sector level and we decided not to follow this approach. The inclusion of macroeconomic and banking sector variables helped improve performance of distress prediction models. Peltonen, Piloiu and Sarlin (2015) introduced estimated network linkages into an early-warning model to predict bank distress among European banks. Authors used multivariate extreme value theory to estimate equity-based tail-dependence networks. This model links proxy for the markets' view of bank interconnectedness in case of elevated financial stress. This paper found that early warning models including estimated tail dependencies outperform bank-specific benchmark models without networks. This paper gave direct support for measures of interconnectedness in early-warning models.
Maghyereh and Awartani (2014) applied a simple hazard model for an early warning system of bank distress in the Gulf Cooperation Council countries (GCC). They identified a set of leading indicators of bank distress to predict the probability of bank failure in these countries. They covered a wide set of bank level variables and other variables like influence of bank management, competition, diversification, ownership and regulation. They found that good management lowers the probability of distress. The bank specific and other CAMEL type variables as well as the systematic shocks in the financial and macroeconomic environment were all found to be in line with the findings of related empirical studies. Finally, they found that a simple hazard model has performed fairly well in predicting bank distress. Ravisankar and Ravi (2010) used unusual neural network architectures for bankruptcy prediction in banks: Group Method of Data Handling (GMDH), Counter Propagation Neural Network (CPNN) and fuzzy Adaptive Resonance Theory Map (fuzzy ARTMAP). Effectiveness of those techniques was tested by using four different datasets pertaining to Spanish banks, Turkish banks, UK banks and US banks. They selected top five (Spanish dataset) or top seven (in the case of Turkish and UK datasets) variables as input to GMDH, CPNN and fuzzy ARTMAP for classification purpose. The performance of these hybrid techniques of variables selection was compared with that of GMDH, CPNN and fuzzy ARTMAP in their stand-alone mode without variables selection. Cross validation was performed throughout the study. Results indicate that the GMDH outperformed all the techniques with or without variables selection. The results were much better than those reported in previous studies on the same datasets in terms of average accuracy, average sensitivity and average specificity.
Hájek, Olej and Myšková (2015) in their study proposed a model based on random subspace method to predict investment/non-investment rating grades of U. S.
banks. They showed that support vector machines SVM can be effectively used as base learners in the meta-learning model. Both financial and non-financial (sentiment) information were important categories of determinants in financial distress prediction.
SirElkhatim and Salim (2015) conducted a comprehensive review of the existing literature of prediction techniques that have been used to assist in the prediction of bank distress. They categorized the review results into groups depending on the prediction techniques method. They treated the literature from the period 1990-2010 as history of prediction techniques, and after this period until 2013 as recent prediction techniques. They then presented the strengths and weaknesses of both categories. There was no specific type fit for all bank distress issues although they found that intelligent hybrid techniques were the most reliable in term of accuracy and reputation.
Against the literature review done by SirElkhatim and Salim (2015) and selected papers described above, we test two research hypothesis: H1: There is a clear link between macroeconomic conditions and distress risk in banking sector.
H2: There are homogenous clusters of countries (regions) with high, medium and low risk of distress in banking sector.
H1 is motivated by positive results of Betz et al. (2014) and the role of macroeconomic factors for banks financial capacity confirmed by Kapinos and Mitnik (2016). H2 is motivated by the fact that studies on European bank distress, due to the limited number of distress cases, did not account for heterogeneity of countries.

Data and methodology
We use data of 3,691 banks with financial statements (FS) for years: 1990-2015, extracted from BankScope database. A balanced sample of 132 distressed and 132 regular FSs has been selected for distress prediction, while full sample of 47,925 annual FSs has been used for clustering. Betz et. el (2014) also use the European banks data but on the quarterly FS level.
The modelling technique is the logistic regression with stepwise method (0.05 significance level at entry and at stay in the model) for distress prediction using CAMEL variables (like e.g., in Betz et al., 2014). Next, the cluster analyses with k-means method are used, based on the distress probability in each country, estimated on prediction model for balanced sample (mean distress probability 0.5). Countries are classified into three clusters due to the level of risk of distress. The clusters are characterized by low, medium and high risk of distress. Macroeconomic variables are used only on clusters level while Betz et al. (2014) use macro-economic variables for prediction in logit models. one-year equity (EQ) growth rate; C in CAMEL Growth_G_Loans one-year gross loans (GL) growth rate; A in CAMEL ROA return on total assets ratio; E in CAMEL EQ_to_TA equity to total assets ratio; C in CAMEL Deposits_to_G_Loans total customer deposits to (gross) loans ratio; L in CAMEL L_Imp_to_G_Loans loan impairment charge (LI) = loan loss provisions (LLP) to gross loans ratio; A in CAMEL NIM net interest margin to interest-earning assets; E in CAMEL CI operating cost (expense) to operating income ratio; E, M in CAMEL Loans_to_TA net loans to total assets ratio; A, M in CAMEL Loans_to_Funding net loans to customer and short-term funding ratio; L in CAMEL Liquid_ A_to_Funding liquid assets to deposits and short-term funding ratio; L in CAMEL Source: own elaboration Macroeconomic variables included in the clustering description are as follows: GDP growth, inflation rate and unemployment rate. These variables have been widely used in previous studies. As mentioned before, the macro-variables are not used for predictions.

Empirical results
Results for prediction models are presented in Table 2, while the accuracy of models is presented in Table 3.
Overall, accuracy of the prediction models was on the level of 72-76% (see Table 3). The highest percentage of correctly classified distressed banks occurred within 4 years prior to the distress. Analysed banks are situated in 17 different countries (see Table 4). In Denmark and Sweden probability of distress is low in 4, 3, 2 and 1 years prior to distress. In Ireland probability of distress is high in 4, 3, 2 and 1 years prior to distress. In UK the level of probability of distress is at medium level in all years prior to distress. There is a group of countries where probability of distress increases as the distance to distress decreases: Germany, Italy, Netherlands; and the group of countries where probability of distress decreases as the distance to distress decreases: Latvia. Cyprus, Greece, Iceland. In the rest of countries the trend is not clear. Results for the 1 year to distress are presented in Table 5 and Figures 1 and 2. Low risk of distress 1 year prior to distress cluster is characterised by high value of EQ_to_ TA ratio, high value of ROA, and high value of Loans_to_TA and Loans_to_Funding ratios. High risk of distress 1 year prior to distress cluster is characterised by low value of EQ_to_TA ratio, low ROA, but high value of Loans_to_TA and Loans_to_Funding ratios. Considering macroeconomic conditions, lower risk of distress is connected with lower unemployment rate, lower inflation and higher GDP growth.   Results for the 2 years to distress were presented in Table 6 and Figures 3 and 4 of distress 2 years prior to distress cluster is characterised by high value of EQ_to  Results for the 2 years to distress were presented in Table 6 and Figures 3 and 4 of distress 2 years prior to distress cluster is characterised by high value of EQ_to Results for the 2 years to distress were presented in Table 6 and Figures 3 and 4. Low risk of distress 2 years prior to distress cluster is characterised by high value of EQ_to_TA ratio, high CI, and high value of Loans_to_TA and Loans_ to_Funding ratios. High risk of distress 2 years prior to distress cluster is characterised by low value of EQ_to_TA ratio, low CI, and low value of Loans_to_TA and Loans_to_Funding ratios. Considering macroeconomic conditions, lower risk of distress is connected with lower inflation and lower GDP growth. Unemployment is not correlated with risk of distress 2 years prior to distress. Results for the 3 years to distress were presented in Table 7 and Figures 5 and 6. Low risk of distress 3 years prior to distress cluster is characterised by high value of NIM, high CI, and high value of Loans_to_Funding ratio but low value of Growth_G_Loans ratio. High risk of distress 3 years prior to distress cluster is characterised by low value of NIM, low CI, and low value of Loans_to_Funding ratio but high value of Growth_G_Loans ratio. Considering macroeconomic conditions, lower risk of distress is connected with lower inflation and lower GDP growth. Unemployment is not correlated with risk of distress 3 years prior to distress.   Results for the 3 years to distress were presented in Table 7 and   Results for the 4 years to distress are presented in Table 8 and Figures 7 and 8. Low risk of distress 4 years prior to distress cluster is characterised by high value of EQ_to_TA ratio, low value of Deposits_to_G_Loans and Growth_G_Loans ratios. High risk of distress 4 years prior to distress cluster is characterised by high value of EQ_to_TA ratio, low value of Deposits_to_G_Loans ratio and high value of Growth_G_Loans ratio. Considering macroeconomic conditions, lower risk of distress is connected with lower inflation and lower GDP growth. Unemployment is not correlated with risk of distress 4 years prior to distress.

Conclusions
To sum up, the following CAMEL-like ratios and macroeconomic variable be significant in distress prediction and important in clustering of countries:

Conclusions
To sum up, the following CAMEL-like ratios and macroeconomic variables were found to be significant in distress prediction and important in clustering of countries: 1. For 1-year distance to distress four variables were statistically significant: ROA, EQ_to_TA, Loans_to_TA and Loans_to_Funding. Among macroeconomic variables high distress probability is connected with low GDP growth. Lower distress was connected with lower unemployment. Only for 1 year distance to distress the unemployment was connected with distress.
2. For 2-year distance to distress four variables were statistically significant: EQ_to_TA, CI, Loans_to_TA and Loans_to_Funding. Among macroeconomic variables high distress probability is connected with high GDP growth, conversely to the 1-year distance to distress. 3. For 3-year distance to distress different variables were statistically significant: Growth_G_Loans, NIM, CI and Loans_to_Funding. Among macroeconomic variables high distress probability is connected with high GDP growth, conversely to the 1-year distance to distress. 4. For 4-year distance to destress only three variables were statistically significant: Growth_G_Loans, EQ_to_TA and Deposits_to_G_Loans. Among macroeconomic variables high distress probability is connected with high GDP growth, conversely to the 1-year distance to distress. 5. Betz et. el. (2014) obtained similar results, however the prediction period was not fully comparable (quarterly data, 8 quarters prior to distress, recursive model).
Based on the results of prediction models and clustering our research hypotheses were tested. We did not find support for H1 on clear correlation between macroeconomic conditions and distress risk in banking sector. The correlation between macroeconomic conditions and distress risk in banking sector is not clear or obvious. The correlation depends on the amount of time to distress.
We did not find support for H2 either. Clusters of low, medium and high distress risk regions are not homogenous. Homogeneity of clusters depends on the amount of time to distress.