A Ranking of Combined Nomenclature Chapters According to Quality of Data on Intra‐Community Trade in Goods of Polish Businesses

Adopting the Intrastat system in Poland on its EU‐accession on 1st May, 2004 imposed a new obligation on companies trading goods within the EU. They are obliged to provide information on their intra‐Community trade in the form of monthly declarations. Data on intra‐Community trade from all Member States are collected by Eurostat and disseminated in the form of the Comext data‐ base. In public statistics, special attention is being paid to data quality. It is constantly monitored and certain actions are taken to improve it. In order to assess quality of data on intra‐Community trade, the authors have calculated differences between declared values of supplies of goods from Poland as well as foreign acquisitions originating in Poland. The aims of the paper are an analysis of quality of data on Polish intra‐Community trade in goods within Combined Nomenclature chapters as  well as  creating a  ranking of  chapters with regard to data accuracy (one of quality dimensions) which we define in terms of divergence between mirror data. Data accuracy was measured with the use of aggregate data quality indices. The ranking of Com‐ bined Nomenclature (CN) chapters was presented according to the calculated index value for both intra‐Community supplies of goods (ICS) and intra‐Community acquisitions (ICA). We utilised data on Polish exporters’ transactions from 2017 from the Comext database. In the research results, we in‐ dicate those chapters for which large relative discrepancies between mirror data are observed (thus data quality is low). For chapters with low data quality, we present inner structures of discrepancies by country and by CN position. The problem of quality of data on intra‐Community trade is addressed in Poland only in publications of the Central Statistical Office/Statistics Poland. There are no scientific publications on this subject. 196 Iwona Markowicz, Paweł Baran FOE 4(343) 2019 www.czasopisma.uni.lodz.pl/foe/ Therefore, the authors decided to fill this gap and conduct research on sources of information which is the basis for many economic analyses.

The aims of the paper are an analysis of quality of data on Polish intra-Community trade in goods within Combined Nomenclature chapters as well as creating a ranking of chapters with regard to data accuracy (one of quality dimensions) which we define in terms of divergence between mirror data. Data accuracy was measured with the use of aggregate data quality indices. The ranking of Combined Nomenclature (CN) chapters was presented according to the calculated index value for both intra-Community supplies of goods (ICS) and intra-Community acquisitions (ICA). We utilised data on Polish exporters' transactions from 2017 from the Comext database. In the research results, we indicate those chapters for which large relative discrepancies between mirror data are observed (thus data quality is low). For chapters with low data quality, we present inner structures of discrepancies by country and by CN position.
The problem of quality of data on intra-Community trade is addressed in Poland only in publications of the Central Statistical Office/Statistics Poland. There are no scientific publications on this subject.

Introduction
Adopting the Intrastat system in Poland on its EU-accession on 1 st May, 2004 imposed a new obligation on companies trading goods within the EU. They are obliged to provide information on their intra-Community trade in the form of monthly declarations. The information is passed on to the Revenue Administration Regional Office (IAS) in Szczecin. The IAS Intrastat Department's tasks include gathering and control of INTRASTAT declarations data. The data are then pre-processed and form a statistical data file shipped to the Central Statistical Office (GUS). Data on intra-Community trade from all Member States are collected by Eurostat and disseminated in the form of the Comext database. The process of collecting the data is not straightforward, which affects quality of data in many ways. On the other hand, special attention is being paid to data quality in official statistics. It is constantly monitored and certain actions are taken to improve it. One of the factors of data quality is its accuracy (Eurostat, 2007). In order to assess this dimension of the overall quality of data on intra-Community trade, the authors have calculated differences between declared values of supplies of goods from Poland as well as foreign acquisitions originating in Poland. Such differences are in part a consequence of threshold values. Apart from that, they depend on quality of data gathered. An important practical issue is to point out such areas of trade in goods that reveal unsatisfactory quality of data.
Assessment of quality of data on intra-Community trade in goods is possible due to the characteristics of the process of collecting such data. Information comes from declarations made by entities involved in foreign trade, either intra-Community supplies (ICS) or intra-Community acquisitions (ICA). Information on both sides of a conducted transaction are passed to Eurostat and are simultaneously placed side-by-side in the Comext as mirror data (Baran, Markowicz, 2018a). By mirror data for two countries A and B we understand: 1) value (i.e. monetary value) of goods shipped from country A to country B (declared as ICS in country A) alongside its mirror acquisition declared in country B from country A (declared as ICA in country B), and 2) value of goods acquired by country A from country B (declared in country A) and its mirror dispatch of goods from country B to country A (declared in country B).
Existing differences between declared values of transactions between businesses from partner states (mirror data asymmetry) indicate quality of collected data. More on different causes of such asymmetries can be found in (Hamanaka, 2012;Eurostat, 2017a;2017b;Baran, Markowicz, 2018b;GUS, 2018b).
The aims of the paper are an analysis of quality of data on Polish intra-Community trade in goods within Combined Nomenclature (CN) chapters as well as creating a ranking of chapters with regard to data accuracy which we define in terms of divergence between mirror data.

Literature review
Production of high quality statistics depends on the assessment of data quality. Without a systematic assessment of data quality, the statistical office will risk losing control of various statistical processes such as data collection, editing or weighting (Eurostat, 2007).
Several steps have been taken in Europe to focus on improving and developing a systematic approach to quality in National Statistical Institutes. The Leadership Expert Group on Quality was formed in 1999. Its aim was to attain improved quality in the European Statistical System (ESS). The ESS comprises Eurostat and the statistical offices, ministries, agencies, and central banks that collect official statistics in the EU. Product quality is the quality of the output. In the case of a statistical organisation, this is the quality of data and services provided (Eurostat, 2013).
According to Eurostat (2007), there are three aspects of statistical data quality: the characteristics of the product, its perception by the end-user, and some of the characteristics of the whole process of the statistical 'production'. All these three aspects need to be taken into consideration in the data quality assessment process. As for the quality of the product itself, there are six criteria (or dimensions) defined by Eurostat (2003) including relevance, accuracy, timeliness and punctuality, comparability, coherence, and accessibility and clarity. Some of them are interchangeable to a certain degree, for example, there is a trade-off between timeliness (obtaining access to data as soon as possible) and accuracy (the estimates are close or equal to true/exact values).The authors have examined the accuracy of data. We consider it one of the most important, yet understated, factors of data quality from the user's point of view. And in our opinion, there is a lack of awareness that when timeliness is the chosen quality, it affects accuracy to a great extent.
Due to the specificity of data on trade between the EU countries (mirror data), it is possible to see quickly how large the inaccuracies in the data are. At the Eurostat level, information from national Intrastat systems is collected within the framework of the European Statistical System (ESS). It is a partnership in which Eurostat and the national statistical authorities of each EU Member State cooper-ate (European Union, 2018). Their mission is to provide independent high quality statistical information at the European, national and regional levels and to make this information available to everyone for decision-making, research and debate. The ESS has adopted a list of principles that includes principle 4 (commitment to quality): 'statistical authorities are committed to quality', as well as principle 9: 'non-excessive burden on respondents'. According to these rules, entities trading with the EU countries make declarations in the Intrastat system. However, not all entrepreneurs are burdened with this obligation. National statistical offices establish the statistical thresholds above which declarations are mandatory. This is one of the reasons why mirror data on intra-Community trade are not fully compatible. The total turnover of the reporting agents (i.e. those which exceed the basic thresholds) may not be less than 97% of the total value of exports and 93% of the total value of imports. The value of turnover of entities that are exempted from declaring data to the Intrastat system is estimated and added to the reported turnover. Data are also estimated for 'non-response', i.e. entities which are obliged to submit Intrastat declarations but have not submitted their data by the applicable deadline (GUS, 2018a).
In April 2018, the Central Statistical Office/Statistics Poland published their work entitled "Foreign Trade. Mirror and Asymmetry Statistics" (GUS, 2018a). It is the first study in Poland devoted to the causes of discrepancies in data on intra-Community exports and imports of goods. It discusses the topic of differences in partner countries' statistics and indicates their causes. The observed data errors were found to be due to incorrectly filled in Intrastat declarations and the main reasons included: quasi-transit linked to indirect imports or exports (trade involves a non-EU country and two EU countries; external trade statistics are either overestimated or do not cover all flows); triangular transactions (the entity in the first country buys and transports goods from the second one to the third country within the EU); confusion over the repair and processing of goods (repairs should not be registered); the need to determine the cost of processing the goods; including the value of the services in the value of the goods; and incorrect classification of goods (applying incorrect CN codes).
The problem of quality of data on intra-Community trade has been addressed in Poland only in publications of the Central Statistical Office/Statistics Poland. There are no scientific publications on this subject. Therefore, the authors decided to fill this gap and conduct research on sources of information which is the basis for many economic analyses.
In the literature, the topic of quality of data on foreign trade has already been recognised as a subject of research for a long time. According to Parniczky (1980), observations of discrepancies in mirror data on trade have been present in the economic literature at least since the 1920s. Tsigas, Hertel and Binkley (1992) argue that discussion on that issue is even older, and after a seminal work by Morgenstern (1965), they date it back to the 1880s. Modern approaches to this issue emerged in the 1960s, with the work of the United Nations Economic and Social Council (1974) summarising the research during that period. The reasons for errors or irregularities in intra-Community trade mirror data are numerous. Early works on this subject include a study by Morgenstern (1965) and the United Nations (1974). An extensive review and discussion were carried out by Hamanaka (2012), who, after Federico and Tena (1991), divides the reasons for the differences between mirror data into unavoidable differences between CIF-based and FOB-based reporting, structural differences between different customs administrations' approach to transactions and/or commodities classification, as well as human errors and deliberate misclassification. Several authors (including Morgenstern, Parniczky and Hamanaka) suggest that export data are generally less accurate than import data, mostly because of the fact that governments are more interested in recording imports and applying tariffs to them.
Many authors emphasise the fact that discrepancies in foreign trade data result from errors in data entered or from deliberate concealment of economic fraud. As exports of goods and services to another Member State continue to be VAT-exempt, this has created a risk that these goods and services remain untaxed in both the supplying state and in the state of consumption (European Court of Auditors, 2016). Keen and Smith (2007) argue that VAT is vulnerable to evasion and fraud, and abuse of the weaknesses in the VAT system is a serious problem in the EU. They describe the main forms of noncompliance distinctive to VAT, consider how they can be addressed, and assess evidence on their extent in high-income countries. Pope and Stone (2009) concluded that missing trader intra-Community (MTIC) fraud had been a problem across the European Union for many years, and much had been written about its effects and how best to tackle it. The authors emphasise the nature of MTIC frauds, which exploit the zero-rated supply across national boundaries as a means for stealing revenues from national states or creating a VAT debt to be used as a subsidy for undercutting legitimate supplies.
MTIC fraud has changed over time and it has moved from cell phones and computer chips to other commodities (Borselli, 2008). Borselli states that MTIC can virtually involve any type of goods. According to Ainsworth (2009), in the last few months of 2009, MTIC appeared in trading CO 2 permits. In recent years, evidence of MTIC fraud involving fictitious trading in electricity and gas has also emerged and has been studied (Kim, 2017).

Statistical data and research methodology
The study was divided into the following stages: 1) analysis of quality of data on Poland's intra-Community trade divided by Combined Nomenclature chapters (both ICS and ICA); 2) creating CN chapters rankings according to data quality defined in terms of difference between mirror data (asymmetries); 3) in chapters characterised by the largest data asymmetries, calculating data quality measures for distinct dispatch and acquisition from/to Poland directions (by country); 4) in chapters with the largest data asymmetries, indicating CN positions (4-digit) that have influenced the asymmetries the most. The research was conducted on the basis of data from Eurostat's Comext database for 2017 (as of 2 nd November, 2018). It should be noted that the Comext database is corrected on an ongoing basis. The amendments are the result of supplementary information sent by the statistical offices of the Member States. The data included the values (in Euros) of Poland's trade with other EU countries.
The level of quality of data on the trade in goods between Poland and the EU countries is a result of the discrepancies in public statistics between the reported exports and mirror imports (of the trading partner country).
The study used two types of indicators to measure the accuracy of data on intra-Community trade -individual and aggregated indices (Markowicz, Baran, 2019).
The quality of data on Poland's ICA by CN chapter was calculated with the use of aggregated index of data quality (the authors' own proposal): where: The aggregate index takes values from the range from 0 to 2. The higher its value, the lower the quality of the analysed data. The aggregated asymmetry index (1) is based on a different approach to the determination of the accuracy of data compared to the 'general' indices used by Eurostat (Eurostat, 2017a;2017b;GUS, 2018b). The application of absolute differences between dispatches and acquisitions in (1) cumulates all discrepancies and does not compensate for positive and negative differences.
Combined Nomenclature chapters are numbered from 1 to 99. Number 77 serves as a reserve, number 98 is a chapter containing only 'Complete industrial plant' (which is rare and there were no such items declared as bought or sold and then relocated to/from Poland within the period under consideration), and number 99 is a chapter containing 'Special Combined Nomenclature codes' (e.g. transactions with no partner country specified or classified trade). This is why we omitted these three chapter numbers in our analysis.
The quality of data on Polish ICS by CN chapters and shipping countries was calculated using the individual asymmetry index: where: The individual index takes values from the range from −2 to 2. Positive values mean than Polish ICS was higher than the mirror ICA of the trade partner's country. Negative values indicate the predominance of the mirror value of ICA.
In the study, the first formula, i.e. the average value for exports and mirror imports, was used to determine the K-value in the denominator of (2). This allowed us to avoid favouring one of the trading parties.

Research results
The results of the research presented in the article concern the evaluation of the quality of data on trade transactions between Poland and the EU countries in 2017. It should be mentioned that previous analyses indicate a good position of Poland in the ranking of the EU countries. The ranking was created on the basis of the level of aggregated indices of asymmetries between data on trade of individual countries with other EU countries 1 . In terms of intra-Community supplies, Poland ranked 10 th among 28 countries (the aggregated index was 0.0925). The highest quality of data was recorded in Germany (0.0517) and the lowest (we consider them outliers) in Malta (0.4893) and Cyprus (0.5721). Analysing quality of data on intra-Community acquisitions of goods, Poland ranked 8 th (0.0760). The ranking started with the Netherlands (0.0417) and ended with Malta (0.4542). Of course, one should remember that the assessment of the quality of a country's data is influenced by actions aimed at improving the completeness, correctness and timeliness of the declarations collected in this particular country, but also by the quality of data of the partner countries.
The results of the research will be presented in the following order: a ranking of CN chapters according to the quality of mirror data (aggregated indices), in the chapters with the lowest quality of data -indication of the partner countries with which the transaction values are the least convergent, then indication of the CN positions with the greatest divergences of data in the analysed chapters. These steps are presented for both supply (ICS) and acquisition (ICA) of goods.

ICS declared in Poland -quality of data
For 96 CN chapters, we calculated aggregated indices of mirror data asymmetries regarding dispatches of goods originating in Poland (Polish ICS). In Figure 1, the chapters with the highest index values are presented and the first five are marked. The quality of data in these chapters is low and the procedure for explaining the large discrepancies in the mirror data should focus on the trade in goods from these chapters. They included the following chapters: chapter 50 -Silk, chapter 14 -Vegetable plaiting materials; vegetable products, chapter 93 -Arms and ammunition; parts and accessories thereof, chapter 89 -Ships, boats and floating structures, and chapter 97 -Works of art, collectors' pieces and antiques. Indices calculated for these chapters where 1.5668, 1.2486, 1.1495, 1.0578, 0.9372, respectively. For comparison, Fig. 1 also shows the chapters with the highest data quality and identifies two such chapters: chapter 29 -Organic chemicals, and chapter 84 -Nuclear reactors, boilers, machinery and mechanical appliances; parts thereof. In these cases, the values of indices were equal to 0.0701 and 0.0674.
Within in Poland are higher than the declared values of mirror acquisitions from Poland by businesses in the analysed country. Negative values of the index indicate the opposite. Table 1 shows the number of countries for which the individual indices indicated low quality of mirror data. These are values less than or equal to −1 or greater than or equal to 1 (for comparison, information for two chapters with the highest data quality is also given).

-Organic chemicals, and chapter 84 -Nuclear reactors, boilers, machinery and mechanical
appliances; parts thereof. In these cases, the values of indices were equal to 0.0701 and 0.0674.  Table 1 shows the number of countries for which the individual indices indicated low quality of mirror data. These are values less than or equal to −1 or greater than or equal to 1 (for comparison, information for two chapters with the highest data quality is also given).  Table 1, the extreme values of indicators, i.e. −2 and 2, are considered. These are situations when one of the mirror values is equal to zero. In order to avoid indicating high values of individual indices for countries with low transaction values, we assumed that the value of the ICS from Poland or the mirror value of the ICA must exceed EUR 25 000. The extreme values of the index were found in chapter 50 -Ireland (−2; no declarations on the Polish side) and chapter 93 -Bulgaria, Romania and Hungary (2; declarations on the Polish side only). The analysis by country provides us with two general conclusions. If transactions in a given chapter are asymmetrically documented for a small number of countries (chapter 50 -4 countries), our attention should be paid to these directions of dispatches. However, if such an asymmetry concerns a large number of countries in a given chapter (chapter 93 -16 countries), one should rather look for systematic errors (e.g. incorrect coding of goods). In the five analysed chapters, the following countries have emerged most frequently (3 times): Sweden, Great Britain, Ireland, Italy and Spain. For comparison, Table 1 includes sections with the highest quality of data. For chapter 84 no country is indicated and for chapter 29 there are only two countries (including Malta).
In Figure 2, values of individual data quality indicators for chapter 97 are presented (as an example). Four negative indices and two positive indices indicating poor data quality (listed in Table 1) are highlighted. The differences in mirror values are also given. The highest differences in data are for the trade between Poland and Germany (EUR 1.7 billion) and between Poland and Spain (EUR −1.4 billion).
the ICS from Poland or the mirror value of the ICA must exceed EUR 25 000. The extrem values of the index were found in chapter 50 -Ireland (−2; no declarations on the Polish sid and chapter 93 -Bulgaria, Romania and Hungary (2; declarations on the Polish side only). T analysis by country provides us with two general conclusions. If transactions in a given chapt are asymmetrically documented for a small number of countries (chapter 50 -4 countries), o attention should be paid to these directions of dispatches. However, if such an asymmet concerns a large number of countries in a given chapter (chapter 93 -16 countries), one shou rather look for systematic errors (e.g. incorrect coding of goods). In the five analysed chapte the following countries have emerged most frequently (3 times): Sweden, Great Britain, Irelan Italy and Spain. For comparison, Table 1 includes sections with the highest quality of data. F chapter 84 no country is indicated and for chapter 29 there are only two countries (includin Malta).
In Figure 2, values of individual data quality indicators for chapter 97 are presented ( an example). Four negative indices and two positive indices indicating poor data quality (list in Table 1) are highlighted. The differences in mirror values are also given. The highe differences in data are for the trade between Poland and Germany (EUR 1.7 billion) an between Poland and Spain (EUR −1.4 billion). Since every CN chapter covers a certain group of goods, we have also examined which CN positions (4-digit) have the greatest impact on the poor quality of data in the five chapters mentioned in Table 1. These include the following goods: in chapter 50 -Woven fabrics of silk or of silk waste (position 5007), in chapter 14 -Vegetable materials of a kind used primarily for plaiting, for example, bamboos, rattans (position 1401) and Vegetable products (

ICA declared in Poland -quality of data
The analysis of intra-Community acquisitions by Polish businesses (Polish ICA) was carried out in the same way as the previous analysis of Poland's ICS. Figure 3 presents the chapters with the highest (and lowest) values of the indices and the first three chapters are marked. These are chapters number 89, 14, and 97 (indices: 1.5681; 1.0820; 0.9807, respectively). They already appeared in the ICS analysis. The chapters with the highest quality of data are: chapter 18 -Cocoa and cocoa preparations, chapter 39 -Plastics and articles thereof. In these cases, the calculated values of indices were close to each other (0.0603 and 0.0600).  Table 2 shows the number of countries for which the individual indices indicated low quality of mirror data. For comparison, information for two chapters with the highest quality of data is also given. We have observed the extreme values of indices (i.e. −2 or 2) in chapter 89 -for countries such as Portugal, Austria, Estonia, Croatia, Hungary, Lithuania, Slovenia (-2; no declarations on the Polish side), Cyprus, Ireland, Luxembourg (2; declarations on the Polish side only), chapter 14 -Portugal (-2), Slovakia (2), chapter 97 -Bulgaria, Hungary, the Netherlands (all -2). Thus, there are more extreme values than in the case of the Polish ICS. This is also the case for chapter 18 (high data quality) where only Cyprus and Malta have shown extreme values (2; these countries constitute a low-data-quality group within the EU). Hungary is the country to appear three times in all three indicated chapters with low quality of data. Figure 4 shows the values of individual data quality indices for chapter 97 (as an example). Six negative indices indicating poor data quality (listed in Table 2) are highlighted. Differences in mirror values are also given. The highest differences concern Poland's trade with Germany (−1.1 billion EUR) and with France (−0.9 billion EUR).
We have also examined CN positions to find out which have the greatest impact on the poor quality of data on ICA in the three chapters indicated. These include the following goods: in chapter 89 -Cruise ships, excursion boats, ferry-boats, cargo ships, barges (position 8901) and Light-vessels, fire-floats, dredgers, floating cranes, and other vessels the navigability of which is subsidiary to their main function; floating docks; floating or submersible drilling or production platforms (position 8905), in chapter 14 -Vegetable products (position 1404), while in chapter 97 -Collections and collectors' pieces of zoological, …, ethnographic or numismatic interest (position 9705) and Antiques of an age exceeding 100 years (position 9706).
13 extreme values (2; these countries constitute a low-data-quality group within the EU). Hungary is the country to appear three times in all three indicated chapters with low quality of data. Figure 4 shows the values of individual data quality indices for chapter 97 (as an example). Six negative indices indicating poor data quality (listed in Table 2) are highlighted.
Differences in mirror values are also given. The highest differences concern Poland's trade with Germany (−1.1 billion EUR) and with France (−0.9 billion EUR).

Conclusions
The most important results of the research are as follows:

Conclusions
The most important results of the research are as follows: 1) indication of the CN chapters with the greatest discrepancies in data (ICS or ICA values and mirror values); 2) indication of countries with which Poland's trade is asymmetrically documented; 3) indication of the commodity items within the CN chapters with the lowest data quality.
The results obtained may serve as a basis for searching for ways to improve quality of data on intra-Community trade. They also stimulate further in-depth research. The dilemmas that have arisen in the course of the study are twofold. One question is whether we should choose between applying general and aggregated indicators or rather try to combine them. The other question is how to properly calculate the indices for CN chapters -whether we should aggregate data by country or by country and CN positions at the same time.