Chapter 6
Estimating Inter-regional Migration in Europe

Adam Dennett and Alan Wilson

6.1 Introduction

In this chapter, we show how methods of biproportional fitting – assembled through the use of entropy-maximising methods – can be used to generate estimates of missing data, and particularly flows, from partially complete sets of data. This enables us to generate inter-regional migration flows within Europe.

Understanding migration is one of the enduring challenges facing geographers and demographers worldwide. The challenge persists, thanks to the range of territories and geographical scales of interest, the difficulty in dealing with inconsistent definitions of migrants and migration events, the variable (and often poor) quality of data and the large and sometimes complex array of tools available. While an understanding of migration patterns and processes at the global scale presents possibly the largest challenge, in Europe we still know far less about the movements of people within the Union than may be expected given the continued desire for knowledge about population change and the amount of demographic data made available from member countries (Poulain et al., 2006). Acknowledging this, a number of recent projects have made attempts to address some of the limitations of (intra-) European migration data. Against a background of varying migrant definitions, inconsistent data relating to the same flows collected for origins and destinations, and incomplete matrices, the MIMOSA (Modelling migration and migrant populations) project (Raymer and Abel, 2008), produced a series of inter-country migration estimates for years between 2002 and 2006 through harmonising available data and using a multiplicative modelling framework to model flows between countries. Following on from this, the IMEM (Integrated Modelling of European Migration) project (van der Erf et al. – http://www.nidi.nl/Pages/NID/24/842.TGFuZz1FTkc.html) is currently looking to improve upon the methodology employed in MIMOSA through a Bayesian statistical approach. Further work has also been carried out by Abel (2010) who used a negative binomial regression (spatial interaction) model to estimate inter-country flows using a suite of predictor variables.

All of these projects have limited their scope to inter-country flows, but within Europe much of the focus of the EU commission is on regional policy (http://ec.europa.eu/regional_policy/index_en.cfm) which is intended to address the quite marked socio-economic disparities which persist between smaller zones within the Union. A recent project which had a partial focus on migration at the regional (Nomenclature of Territorial Units for Statistics level 2 – NUTS2) level in the European Union was the DEMIFER project (De Beer et al., 2010). One of the outputs from this project is a set of regional population projections for four different growth/cohesion scenarios which include a model of regional in- and out-migration based upon annual transition rates (Kupiszewska and Kupiszewski, 2010). While in- and out-migration rates tell us something about migration at the regional level within Europe, they reveal little about the interaction between regions and the hotspots of population exchange which occur within the Union helping drive the dynamism and evolution of local population structures. Indeed our knowledge of these exchanges across the whole Union is poor.

Within the United Kingdom, migration policy is rarely far from the headlines, although as Cangiano (2011) points out, there has been a certain disconnect between immigration policy and wider acknowledgement of demographic issues such as the ageing of the population. Compounding these macro policy problems, there is a local dimension to demographic issues and a current knowledge gap in relation to local immigration concentrations and emigration flows. The UK government has a limited capacity to predict or control the flows of EU nationals into the country and then where in the country they go once they have arrived; conversely knowledge of areas which are likely to experience increased pressures due to migration is vital for effective policy decisions to be made. Where these issues exist in the United Kingdom, we can be sure that similar issues are experienced in other EU member states.

Therefore, in this chapter, we propose a methodology for estimating the inter-regional flows which pose these particular local policy problems within Europe. The work builds on previous research which has made use of variations on the entropy maximising spatial interaction models (SIMs) first introduced by Wilson (1970, 1971) and used in migration research (He and Pooler, 2003; Plane, 1982; Stillwell, 1978). A new Multi-Level Spatial Interaction Model (MLSIM) is proposed which incorporates data at both country and regional levels in Europe to produce estimates of the inter-regional inter-country flows consistent with known information at these different levels. The heart of the method is biproportional fitting.

6.2 The Spatial System and the Modelling Challenge

2006 is the year for which the maximum amount of migration data at all levels is available, and so we use this as our temporal base. The spatial system of 287 NUTS2 regions nested within 31 countries (EU 27 + Norway, Iceland and Switzerland – which will be referred to as the ‘EU system’ in this chapter subsequently) is shown in Figure 6.1. Migration data for some of the flows occurring are available. These data, along with cells representing missing data, can be visualised as an origin/destination matrix as shown for a sample of countries in Figure 6.2. The grey cells in Figure 6.2 represent inter-regional intra-country (internal migration) migration flow counts which are available for most counties in the system. Flows within NUTS2 regions (the white cells on the diagonal) are not included in this analysis. The internal migration data were collated for use in the ESPON-funded DEMIFER project (http://www.espon.eu/main/Menu_Projects/Menu_AppliedResearch/demifer.html), although in almost all cases, these data are freely available from the Eurostat statistics database (often referred to as ‘New Cronos’ – http://epp.eurostat.ec.europa.eu/portal/page/portal/statistics/search_database). Internal migration data for two countries – France and Germany – are not available on this database and were procured separately for DEMIFER from national statistical agencies. It should be noted, although, that while technically European NUTS2 zones, the French overseas departments of Guadeloupe, Martinique, Reunion and French Guiana are not included. The coloured cells represent inter-country flows. Consistent estimates of international (intra-Europe) origin/destination flows have been created for the 31 countries for our year of interest by Raymer and colleagues for the MIMOSA project (Raymer and Abel, 2008).

nfgz001

Figure 6.1 The 287 NUTS2 regions of EU 27 + 3 counties.

Source: Reproduced with permission from Dennett and Wilson (2013)

nfgz002

Figure 6.2 Example migration data availability within Europe

Missing data in this EU system matrix are the inter-country, inter-regional flows – for example, the flows from the three zones in Country 1 to the three zones in Country 3 which sum to the 4,856 migrants we know flowed between Country 1 and Country 3 in Figure 6.2. The modelling challenge, therefore, is to estimate this missing data in the matrix making use of information available at both the country and regional levels. The ultimate goal is to produce a full set of inter-regional estimates which make the most use of all available flow information at all levels within the system. Therefore, it will be necessary to understand the full range of the models which can be built from the elements of the migration system. In defining a suite of models, it will become apparent that some are more likely to produce better results than others in different data scenarios – the model which produces the best results in this current data scenario may not be feasible to use where less data exist, and so other less-optimum models in the family might produce the next best estimates given different data availability.

One question that arises from this challenge in the current context is whether it is feasible to treat this 287 zone EU system as a whole when it is the convention to make a distinction between ‘internal migration’ flows and ‘international migration’ flows. It could be argued that where national borders are real barriers to travel then two systems should be defined, however, in a post-Schengen Europe (Convey and Kupiszewski, 1995; Kraler et al., 2006) national boundaries are not the rigid constructs (both metaphorically and physically) they once were, with flows of migrants between member countries now (in principle) as easy as flows within them. Indeed it is not uncommon for another type of human flow – daily commutes – to occur between countries such as Denmark and Sweden or Luxembourg and Belgium (Mathä and Wintr, 2009). With this being the case, we might expect internal migration and international migration in these areas of Europe to be virtually interchangeable in terms of, for example, the motivations for moves or the limiting factors such as distance which curtail flows. Whether this is actually the case will be explored although the modelling experiments with different models in the family are detailed later in the paper.

6.3 Biproportional Fitting Modelling Methodology

To achieve the task set out in Section 6.1, we will make use of a variation on the doubly constrained entropy maximising SIM (Wilson, 1970, 1971). SIMs are particularly appropriate in the context of migration where empirical studies and model experiments have demonstrated that the propensity to migrate decreases with distance (Boyle et al., 1998; Flowerdew, 2010; Fotheringham et al., 2004; He and Pooler, 2003; Singleton et al., 2010; Stillwell, 1978; Taylor, 1983). Indeed, Olsson (1970, p. 223) notes thatUnder the umbrella of spatial interaction and distance decay, it has been possible to accommodate most model work in transportation, migration, commuting and diffusion.

If c06-math-0001 is the number of migrant transitions, (Rees, 1977), let capital letters such as c06-math-0002 and c06-math-0003 denote countries and let lower case letters such as c06-math-0004 and c06-math-0005 denote NUTS2 regions within a country. Then let c06-math-0006 be the number of migrants from country c06-math-0007 to country c06-math-0008 in some time period, say c06-math-0009 to c06-math-0010 (which we will leave implicit for ease of notation). Then we can denote by c06-math-0011 the number of migrants from region c06-math-0012 in c06-math-0013 to region c06-math-0014 in c06-math-0015. For convenience we denote all the migration flows by c06-math-0016, but the different subscripts and superscripts indicate the different geographical levels in the system. This notation implies that we number the NUTS2 zones from c06-math-0017 for country c06-math-0018 rather than numbering them consecutively for the whole system.

The available data described in Figure 6.2 can then be shown as in Figure 6.3. We have inter-regional, intra-country data for each country – c06-math-0019 where I = J. These internal migration flows could also be described with the notation c06-math-0020 to distinguish them from inter-country inter-regional flows. Intra-regional flows – c06-math-0021 – are not available. At the country level, inter-country flows c06-math-0022 are available.

nfgz003

Figure 6.3 Sample system in Figure 6.2 using defined notation

The row and column totals are known for the c06-math-0023 elements, that is, at the NUTS2 level, and also for the c06-math-0024 inter-country levels. Let these be c06-math-0025 and c06-math-0026 and c06-math-0027 and c06-math-0028, respectively, so that:

These row and column totals are depicted in expanded versions of Figures 6.2 and 6.3, as shown in Figure 6.4a and b. Note that the c06-math-0033 and c06-math-0034 totals do not include intra-country data contained in the c06-math-0035 and c06-math-0036 totals – consistent with the common practice of not including intra-country flows in international migration analysis. Internal migration data are assumed to be consistent such that:

6.5 equation
nfgz004

Figure 6.4 Expanded sample system with margins and sub-margins

The sample data shown in Figure 6.4a and b represent the information we currently have about our system of interest. The formulation thus far implies that we are not seeking to model flows at the NUTS2 level within each country c06-math-0038 (we have these data) and to and from other countries, c06-math-0039. The ultimate modelling goal, however, is to estimate these inter-country regional level flows, effectively filling all c06-math-0040 interior cells in the matrix.

In order to model these NUTS2 level flows between countries, we introduce another element of notation: c06-math-0041 and c06-math-0042 are, respectively, the out-migration flows from NUTS2 c06-math-0043 in country c06-math-0044 to country c06-math-0045 and the in-migration flows to NUTS2 c06-math-0046 in country c06-math-0047 from country c06-math-0048. c06-math-0049 and c06-math-0050 can be viewed as table sub-margins and are equivalent to c06-math-0051 and c06-math-0052 (where the country subscripts are dropped as flows are internal) so that

Then c06-math-0055 in (6.3) and (6.4), for c06-math-0056, would be given by

6.8 equation

These sub-margin elements are shown in Figure 6.5a and b. In addition to these new sub-margins, two new row and column margins can also be calculated. c06-math-0058 and c06-math-0059 are directly related to c06-math-0060 and c06-math-0061 in that:

6.10 equation
6.12 equation
nfgz005

Figure 6.5 Sample system including all sub-margin and margin elements

A final set of margins can be calculated for all interior cells in the matrix where

With a complete system description, we can then consider the variety of models which can be built. Equations (6.1)–(6.4) (6.6), (6.7), (6.9), (6.11), (6.13) and (6.14) can provide the core constraint equations for a suite of entropy maximising models, which can be used to estimate various elements and aggregations of the c06-math-0068 flows in the multi-level system matrix. We might describe this as a family of MLSIMs, with the model possibilities being the following:

  1. i. Model the NUTS2 flows within each country separately – that is, model c06-math-0069 (in which case c06-math-0070 simply functions as a label for each country model). Equations (6.1) and (6.2) would be the accounting/constraint equations.
  2. ii. Model the inter-country flows, c06-math-0071, separately. Equations (6.3) and (6.4) would be the accounting equations.
  3. iii. Model asymmetric NUTS2 flows c06-math-0072 and c06-math-0073 in and out of each c06-math-0074 and c06-math-0075, c06-math-0076 and c06-math-0077. Three versions of the asymmetric model can be formulated.
    1. a. Equations (6.9) and (6.3) would hold as accounting/constraint equations for Equation (6.6) and Equations (6.11) and (6.4) would be the constraints for Equation (6.7).
    2. b. Known c06-math-0078 flows with Equation (6.9) would hold as constraints for Equation (6.6) and known c06-math-0079 flows with Equation (6.11) would hold as constraints for Equation (6.7).
    3. c. It would also be possible to use Equations (6.13) and (6.3) as the constraints for Equation (6.6) and Equations (6.14) and (6.4) as the constraints for Equation (6.7). This model is almost identical to (a), although in this case we would also be modelling c06-math-0080 as c06-math-0081 and c06-math-0082 as c06-math-0083.
  4. iv. Model c06-math-0084 for each country separately using sub-margins (6.6) and (6.7) as constraints.
  5. v. Model c06-math-0085 where c06-math-0086 with Equations (6.9) and (6.11) as constraints.
  6. vi. Model the full array of NUTS2 regions, c06-math-0087 using Equations (6.13) and (6.14) as the accounting constraints.

If the accounting equations (6.1)–(6.4) are deployed as in Models (i) and (ii), this leads to the construction of doubly constrained models for which the main task would be to identify impedance functions, associated generalised costs c06-math-0088, and the model parameter values. In migration, research cost is often the physical distance between places: the propensity to migrate decreases with distance and thus the cost of travel can be inferred to increase. Empirical studies have shown that this distance decay in migration propensity will often follow either a negative exponential or inverse power law (Stillwell, 1978). In SIMs, this is represented by a parameter c06-math-0089, (normally negative), which can be calibrated endogenously if data exist. In the equations which follow, we write the distance decay function c06-math-0090, as exponential – c06-math-0091 – although it would be just as appropriate to write it as a power law – c06-math-0092.

6.3.1 Model (i)

Model (i) is the most straightforward and would produce

6.15 equation

where the generalised distance decay parameter c06-math-0096 can be calibrated endogenously using c06-math-0097 data. An alternative version of this model could calculate origin or destination-specific c06-math-0098 parameters:

6.3.2 Model (ii)

The inter-country Model (ii) would be

6.20 equation

where balancing factors are calculated with equivalent equations to (6.16) and (6.17).

6.3.3 Model (iii)

The asymmetric models in Model (iiia) would take the form

With the balancing factors for (6.21):

6.23 equation
6.24 equation

and the balancing factors for (6.22):

6.25 equation
6.26 equation

Equations (6.21) and (6.22) can be visualised easily by collapsing the matrices in Figure 6.5a and b into just the relevant margins and sub-margins (Figures 6.4a and b, 6.5a and b). These margins then become, effectively, the c06-math-0110 values in a standard two-dimensional matrix.

It is important to note that while in the examples in Figures 6.6a and 6.7a, corresponding country to country sums are equal – for example c06-math-0111 – as they should be, in Model (iiia) the modelled values will not correspond in this way, due to the constraints used. To exemplify, consider Figures 6.8 and 6.9. The marginal values in these figures are almost identical to those in Figures 6.6a and 6.7a (only two migrants are misplaced in Figure 6.8). The interior c06-math-0112 and c06-math-0113 values are quite different. In these modelled matrices, c06-math-0114. For example, the total flows from Country 1 to Country 2 in Figure 6.8 are 6,915, whereas the total flows from Country1 to Country 2 in Figure 6.9 are 7,776. The reason for this is that the c06-math-0115 and c06-math-0116 flows are only constrained to the marginal totals – either c06-math-0117 and c06-math-0118 or c06-math-0119 and c06-math-0120, respectively. In these models, c06-math-0121 and c06-math-0122 have multiple equilibria, only a small number of which result in c06-math-0123. This has implications for Model (iv) in our suite of models.

nfgz006

Figure 6.6 Collapsed matrix showing only region-to-country sub-margins depicted in Figure 6.5

nfgz007

Figure 6.7 Collapsed matrix showing only country-to-region sub-margins depicted in Figure 6.5

nfgz008

Figure 6.8 c06-math-0108 values modelled using the entropy-maximising model in (6.21)

nfgz009

Figure 6.9 c06-math-0109 values modelled using the entropy-maximising model in (6.22)

6.3.4 Model (iv)

Model (iv) takes c06-math-0124 and c06-math-0125 as constraints, with the doubly constrained version of the model defined as

With the balancing factors for (6.27):

6.29 equation
6.30 equation

If c06-math-0130, then it is possible to solve Equations (6.27) and (6.28) – the iterative procedure which calculates that the c06-math-0131 and c06-math-0132 balancing factors are able to converge when c06-math-0133 and its corresponding sub-margin c06-math-0134 are the same value. If c06-math-0135 and c06-math-0136 values are estimated using the entropy-maximising procedure described in Equations (6.21) and (6.22), then c06-math-0137, meaning that the iterative balancing factor routine will not converge and Equations (6.27) and (6.28) cannot be solved.

One solution to this issue is to estimate c06-math-0138 and c06-math-0139 using a method other than the entropy-maximising model described. As already noted, c06-math-0140 and c06-math-0141 are equivalent to c06-math-0142 and c06-math-0143. In this system, we already know the values of c06-math-0144 and c06-math-0145 from the c06-math-0146 internal migration data available. Given this information, the following equations can be used to estimate c06-math-0147 and c06-math-0148:

where these c06-math-0151 and c06-math-0152 estimates are constrained to the corresponding c06-math-0153 values, c06-math-0154, and thus it is possible to solve Equations (6.27) and (6.28).

There is, however, an entropy-maximising solution to this issue as well. In Model (iiib), the constraints used to estimate c06-math-0155 and c06-math-0156 are not the matrix margins as shown in Figures 6.6 and 6.7. By using these margins in Model (iiia), we are not taking advantage of all known information in the system. As c06-math-0157 flows are known, a combination of matrix margins and known interior c06-math-0158 values can be used as constraints, thus the equations for c06-math-0159 and c06-math-0160 become

with the balancing factors for (6.33) calculated:

6.35 equation
6.36 equation

and the balancing factors for (6.34):

6.37 equation
6.38 equation

In constraining c06-math-0167 and c06-math-0168 to c06-math-0169 flows, c06-math-0170. This means that when Equations (6.33) and (6.34) are used as inputs into (6.27) and (6.28) in Model (iv), the balancing factors will always converge and the equations can be solved. Model (iv) represents the c06-math-0171 estimates which will adhere most closely to the known information about the system, and as such might be described as the optimum model for the EU system in this study.

6.3.5 Model (v)

If Model (iv) is the optimum model, then Models (v) and (vi) which produce alternative c06-math-0172 estimates using less information might be described as being suboptimal. Model (v) will only produce c06-math-0173 estimates where c06-math-0174. This model can be written as

6.39 equation

where

6.40 equation
6.41 equation

In this model, c06-math-0178 and c06-math-0179 can be estimated in exactly the same way as c06-math-0180 and c06-math-0181 in Equations (6.31) and (6.32) so

6.42 equation
6.43 equation

The c06-math-0184 estimates in Model (v) will not adhere as closely to known c06-math-0185 values as those in Model (iv), as the constraints are the outer margins on the expanded matrix shown in Figure 6.5.

6.3.6 Model (vi)

Finally, Model (vi) models the whole c06-math-0186 matrix, including c06-math-0187 flows. This model (with an origin-specific distance decay parameter) takes the form

6.44 equation

where

6.45 equation
6.46 equation

with the c06-math-0191 and c06-math-0192 constraints calculated as in Equations (6.13) and (6.14).

This new family of doubly constrained MLSIMs allows estimates of a full matrix of 287 × 287 flows within the defined European system to be made. While Model (iv) defined in Equations (6.33) and (6.34) will produce estimates which are forced to adhere most closely to the known information in the system, other models in the family, which by definition will produce results constrained to less information, will allow us to examine features of the European migration system which do not fit our model assumptions. In doing this we might, for example, be able to identify areas where it would be prudent to adjust the cost proxy in order to distribute migrant flows more effectively within the system without the ‘helping hand’ that constraints give, or indeed answer the question posed in the introduction to this chapter relating to whether it is feasible to treat the European system as, effectively, an internal migration system where national boundaries have little influence on migration flows. First, however, a number of technical challenges relating to the implementation of the models need to be overcome.

6.4 Model Parameter Calibration

All of the models described in the MLSIM family make use of a calibrated distance decay parameter (or parameters), but in making use of such a parameter, a number of problems present themselves. Firstly, calibration can only be carried out using known data within the system – therefore, the c06-math-0193 parameter(s) will have to be calibrated using either c06-math-0194 flows of c06-math-0195 flows. This means that, potentially, these parameters may not be completely appropriate for c06-math-0196 flows. In the absence of other means of estimating appropriate parameters, however, it could be argued this is the best option available at this time, and so it is the option we will have to take.

Accepting that available observed data will be used to calibrate the best-fit parameter(s), the next issue relates to the method used to carry out the calculation. Distance decay parameters in SIMs have historically been calibrated using maximum-likelihood techniques employed in computer algorithms – these commonly use iterative procedures to search for the ‘best-fit’ between the estimates created by the model and the sample data. As an aside, while standard iterative procedures are most frequently used in this type of modelling, it should be noted that a significant amount of work has been carried out by Openshaw and colleagues on the calibration of SIMs using genetic algorithms (Diplock and Openshaw, 1996; Openshaw, 1998): an approach perhaps operationalised most recently by Harland (2008) – we will not explore these methods here, but will use a conventional iterative approach. Batty and Mackie (1972) discuss a range of maximum-likelihood calibration methods, but the Newton–Raphson search algorithm has been shown to perform better than most and has been adopted in both the SIMODEL computer program developed by Williams and Fotheringham (1984) and the IMP program developed by Stillwell (1978); both Fortran programs using the search routine to find the parameter estimates which minimise the divergence between the mean value of the total distance travelled in the observed and modelled flow matrices – an approach also used by Pooler (1994). Thanks to its successful implementation in SIMs for migration analysis, the Newton–Raphson algorithm is the one that we choose to use here.

Initially two versions of the doubly constrained model were run to calibrate a best-fit general distance decay c06-math-0197 parameter for the whole system. The results of these models are shown in Table 6.1 and are contrasted with a more basic singly constrained model for comparison. Here, a selection of goodness-of-fit (GOF) statistics are displayed – the coefficient of determination (R2), the square root of the mean squared error (SRMSE), the sum of the squared deviations and the percentage of misallocated flows – although they all display very similar findings. It is clear that the doubly constrained model with the inverse power function applied to the distance matrix produces the best fit to the original data, with an R2 of some 87%. This compares to an R2 of 72% for the negative exponential function and 62% for the reference production constrained model.

Table 6.1 Goodness-of-fit statistics for c06-math-0198 model experiments

Model equation c06-math-0199 R2 SRMSE Sum Sq Dev % Misallocated
c06-math-0200 −4.2986 0.718 39.393 10,456,839,051 21.554
c06-math-0201 −0.9136 0.865 27.992 5,280,098,085 17.008
c06-math-0202 −1.2201 0.623 45.457 13,886,764,628 28.131

The question that follows is: should this overall distance decay parameter be used as the distance decay input to the estimation model? If this parameter is representative of the whole system, then it could be argued that it could. To test this, a c06-math-0203 model with an inverse power distance decay function (akin to that in the second row of Table 6.1) was run separately for each of the 21 countries in the system comprised of more than a single zone in order to calibrate a series of c06-math-0204 parameters. The results of these experiments are shown in Table 6.2.

Table 6.2 Goodness-of-fit statistics for inter-regional migration data modelled with a doubly constrained model with a power distance decay β parameter

Country code Country R2 c06-math-0205 (power function)
FI Finland 0.996 −0.754
SE Sweden 0.974 −0.771
AT Austria 0.972 −0.747
HU Hungary 0.963 −0.567
SK Slovakia 0.948 −0.773
NL Netherlands 0.936 −1.279
DK Denmark 0.930 −0.969
NO Norway 0.919 −0.814
BG Bulgaria 0.901 −0.825
CZ Czech Republic 0.889 −0.807
UK United Kingdom 0.884 −0.927
PL Poland 0.877 −1.068
CH Switzerland 0.788 −0.867
BE Belgium 0.772 −1.049
RO Romania 0.745 −0.763
DE Germany 0.715 −0.760
IT Italy 0.699 −0.718
ES Spain 0.621 0.154
FR France 0.549 1.093

In this instance, we chose the inverse power distance decay function as it was the best-performing function in the c06-math-0206 experiment. Serendipitously, the power function is scale independent whereas the exponential function is not (Fotheringham and O'Kelly, 1989), meaning we are able to directly compare the c06-math-0207 parameters directly. In Table 6.2, we use the R2 value as our measure of goodness of fit. We are aware that there has been some debate over which is the most appropriate metric to use (Knudsen and Fotheringham, 1986); however, R2 is commonly used and for comparative proposes, the choice of statistic has little relevance to the outcome. A number of points can be made about the results displayed in Table 6.2. Firstly, the countries are ranked according to their goodness of fit and we can observe that around half of the list have R2 values over 90%, with Finland, Sweden and Austria ranked the highest – Finland with an exceptionally high R2. It is clear, however, that there is a considerable variation in the c06-math-0208 parameters for each country. This would suggest that it may not be ideal to use the generalised c06-math-0209 parameter to model flows for the whole EU system. Furthermore, the reliability of some of the c06-math-0210 parameters can be called into question with particularly low R2 values for Spain and France – countries which exhibit positive c06-math-0211 parameter values. The exact way in which these parameters can be understood has been questioned (Fotheringham, 1981); however, one interpretation is that the value can be read behaviourally and the number is an index of the deterrent to migration, with high negative values representing distance being a strong deterrent to migration and low negative values inferring that distance is a weak deterrent. Positive values in this context would indicate that distance is an attraction to interaction – that is, the further away origins and destinations, the more likely migration is to occur. Clearly this is unlikely to be the case across the whole of Spain and France.

Given this evidence, generalised distance decay parameters are currently poor candidates for inputs into an estimation model for the whole of Europe. A potential solution, therefore, would be to use distance decay parameters which are specific to each NUTS2 zone – a technique first outlined by Stillwell (1978). This returns us to Model (i) and Equations (6.18) and (6.19).

The GOF statistics for Model (i) – taken for all internal migration flows in the system rather than for each separate country) – are shown in Table 6.3. Evidently, these models provide much better fits than the generalised parameter models, with R2 values around 93%. A geography to these distance decay parameters can be observed, with the frictional effects of distance operating very differently for in- and out-migration flows across the EU system, as is shown in Figures 6.10 and 6.11. It should be noted that the nature of the algorithm used to carry out this calibration means that where it is not possible to calculate a zone-specific distance decay parameter (e.g. in those countries where c06-math-0212 data do not exist such as Greece), a generalised distance decay parameter which is calculated for the whole system prior to zone-specific calibration is allocated. Given the results of these experiments, it is these origin and destination-specific parameters calibrated on internal migration data which will be used as distance decay inputs into our later estimation models.

Table 6.3 Goodness-of-fit statistics for Model (i) with c06-math-0213 and c06-math-0214 parameters

Model equation R2 SRMSE Sum Sq Dev % Misallocated
c06-math-0215 0.928 19.802 2,642,462,153 12.284
c06-math-0216 0.931 19.582 2,583,959,209 12.163
nfgz010

Figure 6.10 c06-math-0217 values calibrated on inter-regional, intra-country migration data, 2006

nfgz011

Figure 6.11 c06-math-0218 values calibrated on inter-regional, intra-country migration data, 2006

6.5 Model Experiments

The first step is the estimation of margin constraints. In the section of the MLSIM family of models outlined in Section 6.3, which used to estimate c06-math-0219 flows, all require some inputs which are not available directly from the data to hand. In addition to the distance decay parameters that will be calibrated only on internal migration data, Models (iiia), (iiib), (iv), (v) and (vi) make use (directly and indirectly) of c06-math-0220 and c06-math-0221 margins. Consequently, sub-models are required to make estimates of these data. When c06-math-0222 and c06-math-0223, it follows that it should be feasible to estimate the NUTS2-level c06-math-0224 and c06-math-0225 margins from the country-level c06-math-0226 and c06-math-0227 margins, given the appropriate ratio values. But which are the appropriate ratios to use?

As information at the internal migration c06-math-0228 level is complete, it might be possible to use the distribution of internal migrants to estimate the distribution of international migrants such that

The assumption here is that the distribution of internal in- and out-migrants within countries is the same as the distribution of immigrants and emigrants moving between countries. But can internal migrant distributions be used to estimate distributions of international migrants within countries accurately? We might expect, for example, capital cities to dominate these distributions with larger urban areas also providing significant origins and destinations at both levels. Is this the case in reality? Figure 6.12 shows the comparable distributions of internal and international migration for a selection of European countries at NUTS2 level (all countries where comparable data exist at this level), taken from Census data from the 2000–2001 census round and compiled by Eurostat. Broadly speaking, there are positive correlations between internal and international migration distributions, although there are some noticeable differences in the correlation coefficients denoted by the R2 values (and the scatter plots). For most countries in the selection, R2 values are over 80%, indicating that internal migration distributions are reasonably good predictors of international migration distributions. For some countries, however, this predictive relationship is weak. Poland, for example, has an R2 value of only 17%, with the Czech Republic (23%) and Switzerland (28%) not faring much better. The reasons for the lack of correlation in these countries are difficult to ascribe, but differences in the perceived attractiveness of particular destinations to internal and international migrants will affect the correlations. Studying Figure 6.12, the scatter plots show that there is very little pattern in the association between internal in-migration and immigration in Poland, although examining Switzerland and the Czech Republic, it appears that were it not for one or two outliers in the scatter plots, the correlation would be far stronger. Through mapping the differences between internal and international migrant distributions, it is possible to interrogate these and other outliers a little further.

nfgz012

Figure 6.12 Correlation between internal (‘Place of residence changed outside the NUTS3 area’) and international (‘Place of residence changed from outside the declaring country’) migrant distributions for NUTS2 regions, selected EU countries, 2001.

Source: Raymer and Abel (2008)

Figure 6.13 maps the distribution of the differences between the regional shares of internal and international (in-)migration across NUTS2 zones in Europe (where data are available). A number of points should be made about this map. Firstly, all yellow zones signify less than a 1% deviation between the distribution of internal and international migrants – these zones include much of the United Kingdom and large parts of France, Italy, the Czech Republic, Poland and Greece. In these areas, internal migration distributions can be seen to be good predictors of international migration distributions. Secondly, zones in light orange and light green show only up to a 3% deviation – these include most of the rest of France, a number of regions in Scandinavia, the Netherlands, Poland, the United Kingdom, Italy and Greece. Perhaps the most important point of note, however, which becomes very apparent when examining Figure 6.13, is that there appears to be a ‘capital city effect’. The regions containing London, Paris, Madrid, Rome, Amsterdam, Stockholm, Helsinki, Prague, Lisbon and Dublin all exhibit a noticeably higher (average 8.7%) proportion of the national share of international immigrants compared to the national share of internal in-migrants. Some capital cities go against this pattern, although Bern can probably be discounted as in terms of city status, Zurich (which matches this trend) could be argued to be a city of more importance within Switzerland. Oslo, Athens and Budapest have lower proportions of international immigrants than internal in-migrants, but the city region where a very large trend in the opposite direct occurs is Warsaw in Poland. Here, the proportion of internal migrants to Warsaw is over 18% higher than the proportion of international migrants. The fact that Warsaw is an attractive destination for internal migrants would not be surprising, but why it accounts for a much larger proportion of these migrants compared to international migrants is unclear without further investigation of the particular motivations of migrants in Poland.

nfgz013

Figure 6.13 Distribution of NUTS 2 regions where shares of internal and international in-migrants differ, selected EU countries, 2001

Based on this, it could be argued that if this capital city effect could be accounted for consistently, and the proportions of migrants associated with other regions in the country adjusted accordingly, then internal migration distributions could be used to make international migration c06-math-0231 margin estimates relatively reliable, assuming that these associations hold over time.

Incidentally, the time dimension provides us with another option for modelling the sub-national distributions of international migrants. Where decennial census (or other periodic) data can provide sub-national immigrant distributions, if country-level immigrant data are available, sub-national distributions can be estimated with the formula:

6.49 equation

Even if more up-to-date national data are not available, an assumption could be made that these ratios hold over time so that

6.50 equation

Returning to Equations (6.47) and (6.48), unfortunately the nature of the data collated by Eurostat means that it is not possible to assess whether emigrant distributions also follow the distributions of internal out-migrants (these data are census/population register data relating to resident populations in recording countries and therefore cannot contain emigrant data). Given the high degree of association between internal migration in- and out-migration distributions (Figure 6.14), it might be reasonable to use international immigrant distributions to estimate international emigrant distributions, but the capital city effect would need to be explored before this could be done with confidence. Here our concern is to present a general methodology for estimating the full EU matrix of NUTS2 flows and so we will not dwell on this element of the estimation process at this stage, although it should be stressed that the estimation of c06-math-0234 and c06-math-0235 marginal values will have an important bearing on reliability of the final modelled outputs.

nfgz014

Figure 6.14 Correlation between the NUTS2 regional share of internal in- and out-migration flows across EU countries, 2006

As a consequence of the data to hand and the investigations of internal/international migration associations, at this stage internal migration distributions will be used to estimate c06-math-0236 and c06-math-0237 marginal values for the model as in Equations (6.47) and (6.48), but we recognise that this is an area of the methodology which could be improved in the future.

6.6 Results

In a file containing the full suite of Models (iv), (v) and (vi), c06-math-0238 and c06-math-0239 estimates are publicly available for anyone wishing to make use of the data through the following link:

  1. http://dl.dropbox.com/u/8649795/Multilevel_SIM_Results.xlsx.

Model (iv) takes in c06-math-0240 and c06-math-0241 inputs from Model (iiib) and can be viewed as the optimum model as any outputs will be constrained to known c06-math-0242 flows and estimated c06-math-0243 and c06-math-0244 margins (where c06-math-0245 and c06-math-0246 estimates used these constraints). Models (v) and (vi), in contrast, are suboptimal as estimates will not be constrained to c06-math-0247 flows, only c06-math-0248 and c06-math-0249, or c06-math-0250 and c06-math-0251 margins. Running suboptimal models is an important part of the model-building process as they allow us to explore the reliability of some of the general model assumptions; however here, we present just the results from the optimum Model (iv) of the family of MLSIMs.

MLSIMs offer the opportunity to examine inter-regional flows between all countries in our chosen EU system – examining all flows or even all significant flows would be an extensive task; therefore, we will take the United Kingdom as exemplification. Figure 6.15 depicts all flows over 200 persons entering UK regions from other EU regions, and it is clear that particular origin and destination combinations predominate. Firstly, the importance of London and the South-East corner of the United Kingdom is very apparent – nearly all flows are concentrated in this area, with only a small number entering regions containing other large cities such as Manchester and Birmingham. A large number of these flows originate in Polish regions and many terminate in London. Interestingly, flows from Poland into East Anglia, which have gained much media attention in the United Kingdom are picked up by the model, despite other explanatory factors such as increased job opportunities in the agricultural sector not taken into consideration by the model. One small caveat in relation to these flows can be made referring back to our observations about the poor relationship between internal and international migration distributions in Poland made in Section 5.1. Where the relationship between these flow distributions is poor in Poland, some of the precise flow volumes originating from these Polish NUTS2 regions should be treated with caution. Where these relationships are stronger in France and Spain, the large flows from other major capital cities such as Paris and Madrid can be viewed more reliably, indeed given the ‘capital city effect’ also noticed, these flows may even be larger in reality. High-volume flows are also noticeable from Cyprus, although these may well be associated with the movement of armed forces.

nfgz015

Figure 6.15 Flows greater than 200 migrants entering UK regions from other EU system regions, 2006

Examining the flows out of UK regions to the rest of the EU system (Figure 6.16), the South-East – and especially London – predominates as with immigration. Destinations for migrants leaving the United Kingdom are quite different to the origins for those arriving in 2006. The large volumes of migration (we may assume related to retirement) can be observed flowing into Spanish regions – regions including the largest cities of Madrid, Barcelona and Valencia, as well as the Costa del Sol. Large flows can also be observed from London and other regions of the United Kingdom into Ireland – this is partially a function of Ireland consisting of only two regions and so these flows appear more concentrated, although the close ties between all countries of the United Kingdom and Ireland mean that these flows are entirely expected.

nfgz016

Figure 6.16 Flows greater than 200 migrants leaving UK regions for other EU system regions, 2006

6.7 Conclusions and Comments on the New Framework for Estimating Inter-regional, Inter-country Migration Flows in Europe

In this chapter, we have introduced a new family of models for estimating inter-regional migration flows in Europe. Our guiding principle was a simple one – to make use of the maximum amount of available data (embodied in the constraints imposed within the model and the parameters used to influence the patterns) to produce estimates of the maximum likelihood given the information available.

The estimates produced by Model (iv) represent the current ‘best-guess’ given the data to hand. They embody all known information about flows into and out of countries, the behaviour of internal migrants within their home countries and the relationships between the destination preference of internal and international migrants. There are, of course, a number of areas where these estimates could be improved. Firstly, the country-level international migration data constraints are themselves estimates. The data used were taken from the MIMOSA project (Raymer and Abel, 2008) – data which the authors recognise the limitations of and which will soon be superseded by improved estimates from the IMEM project mentioned in the introduction. When these model inputs can be improved, then there will be a knock-on improvement to our own estimates. We have already acknowledged that there are issues with the methodology we employed to estimate the c06-math-0252 and c06-math-0253 matrix margins which formed constraints either directly or indirectly for all models. As outlined, in these estimates, we have simply taken the national distributions of internal migrants to distribute international migrants. While there are high correlations between these distributions for in-migration, demonstrated across Europe from Census and register data, a ‘capital city effect’ persists where these destinations can attract up to 10% more migrants internationally than internally. Furthermore, we have been unable to ascertain whether a similar situation exists for out-migration flows. Finally, in using distance decay parameters calibrated with internal migration data, we could be introducing error where internal migration flows, even in an open border Europe, act very differently to international flows. Experimentation with suboptimal models which are not reported in this chapter suggest that this might be the case, with country border effects far stronger than the in- and out-migration constrained models estimate.

Model (vi) constrains inter-regional estimates to known (estimated) inter-country flows allowing us to explore the likely inter-regional international flows within Europe. This is an important development as for the first time we are able to examine, at a much higher resolution than previously possible, pressure points within the migration system. Not shown in this chapter, but evident in the results which are available through the web link given at the beginning of Section 6.6, are the regions in central and southern Spain which are likely destinations for the large influx of migrants from Romania, along with the areas of Romania which are equally as affected (if not more) socially, demographically and economically by these large flows of people. In the United Kingdom, we have shown the localised concentrations of migrant flows particularly into London, the South-East and East Anglia, and especially from regions in Poland.

While even in this optimum model, there are improvements that can be made. Now the modelling framework is in place, when improved inputs can be supplied to the model, then improved outputs can be very easily achieved. In this chapter we have concentrated on 2006, but data (albeit less comprehensive) for other years exist, and so a natural extension to this work would be to explore the temporal dynamics of particular sets of inter-regional flows in the system. Furthermore, in this analysis, we have chosen Europe to exemplify our models, but clearly we need not be limited to Europe – the model can easily be applied to estimate sub-national flows in a global context, opening up exciting possibilities for a more complete global sub-national understanding of migration. Implementation of a wider spatial system and broader temporal base means that the model framework introduced in this chapter should provide a useful tool for policy decisions related to demographic trends both in Europe and further afield.

References

  1. Abel, G.J. (2010) Estimation of international migration flow tables in Europe. Journal of the Royal Statistical Society: Series A (Statistics in Society), 173 (4), 797–825.
  2. Batty, M. and Mackie, S. (1972) The calibration of gravity, entropy, and related models of spatial interaction. Environment and Planning, 4 (2), 205–233.
  3. Boyle, P.J., Flowerdew, R. and Shen, J. (1998) Modelling inter-ward migration in Hereford and Worcester: The importance of housing growth and tenure. Regional Studies, 32 (2), 113–132.
  4. Cangiano, A. (2011) Demographic Objectives in Migration Policy-Making, The Migration Observatory, University of Oxford, Oxfordhttp://migrationobservatory.ox.ac.uk/sites/files/migobs/Demographic%20Objectives%20Policy%20Primer.pdf (accessed 12 January 2016).
  5. Convey, A. and Kupiszewski, M. (1995) Keeping up with Schengen: Migration and policy in the European Union. International Migration Review, 29 (4), 939–963.
  6. De Beer, J., Van der Gaag, N., Van der Erf, R., Bauer, R., Fassmann, H., Kupiszewska, D., Kupiszewski, M., Rees, P., Boden, P., Dennett, A., Stillwell, J., De Jong, A., Ter Veer, M., Roto, J., Van Well, L., Heins, F., Bonifazi, C. and Gesano, G. (2010) DEMIFER - Demographic and Migratory Flows affecting European Regions and Cities, Applied Research Project 2013/1/3. Final Report, ESPON and NIDI. http://www.espon.eu/main/Menu_Projects/Menu_AppliedResearch/demifer.html (accessed 12 January 2016).
  7. Dennett A. and Wilson A. (2013) A multi-level spatial interaction modelling framework for estimating inter-regional migration in Europe. Environment and Planning A 45, 1491–1507. http://www.envplan.com/abstract.cgi?id=a45398
  8. Diplock, G. and Openshaw, S. (1996) Using simple genetic algorithms to Calibrate Spatial Interaction Models. Geographical Analysis, 28 (3), 262–279.
  9. Flowerdew, R. (2010) Modelling migration with poisson regression, in Technologies for Migration and Commuting Analysis: Spatial Interaction Data Applications (eds J. Stillwell, O. Duke-Williams and A. Dennett), IGI Global.
  10. Fotheringham, A.S. (1981) Spatial structure and distance-decay parameters. Annals of the Association of American Geographers, 71 (3), 425–436.
  11. Fotheringham, A.S., Rees, P., Champion, T. et al. (2004) The development of a migration model for England and Wales: Overview and modelling out-migration. Environment and Planning A, 36 (9), 1633–1672.
  12. Fotheringham, A.S. and O'Kelly, M.E. (1989) Spatial Interaction Models: Formulations and Applications, Kluwer Academic Publishers.
  13. Harland, K. (2008) Journey to Learn: Geographical Mobility and Education Provision, University of Leeds.
  14. He, J. and Pooler, J. (2003) Modeling China's province-to-province migration flows using spatial interaction model with additional variables. Geographical Research Forum, 23, 30–55.
  15. Knudsen, D.C. and Fotheringham, A.S. (1986) Matrix comparison, goodness-of-fit, and spatial interaction modeling. International Regional Science Review, 10 (2), 127–147.
  16. Kraler, A., Jandl, M. and Hofmann, M. (2006) The evolution of EU migration policy and implications for data collection, in Towards the Harmonisation of European Statistics on International Migration (THESIM) (eds N. Poulain, N. Perrin and A. Singleton), Université Catholique de Louvain–Presses Universitaires de Louvain, Louvain-la-Neuve, pp. 35–75.
  17. Kupiszewska, D. and Kupiszewski, M. (2010) Deliverable 4 – Multilevel Scenario Model, DEMIFER – Demographic and Migratory Flows Affecting European Regions and Cities, ESPON & CEFMR, Warsaw. http://www.espon.eu/export/sites/default/Documents/Projects/AppliedResearch/DEMIFER/FinalReport/DEMIFER_Deliverable_D4_final.pdf (accessed 12 January 2016).
  18. Mathä, T. and Wintr, L. (2009) Commuting flows across bordering regions: A note. Applied Economics Letters, 16 (7), 735–738.
  19. Olsson, G. (1970) Explanation, prediction, and meaning variance: An assessment of distance interaction models. Economic Geography, 46, 223–233.
  20. Openshaw, S. (1998) Neural network, genetic, and fuzzy logic models of spatial interaction. Environment and Planning A, 30 (10), 1857–1872.
  21. Plane, D.A. (1982) An information theoretic approach to the estimation of migration flows. Journal of Regional Science, 22 (4), 441–456.
  22. Pooler, J. (1994) An extended family of spatial interaction models. Progress in Human Geography, 18 (1), 17–39.
  23. Poulain, M., Perrin, N. and Singleton, A. (eds) (2006) THESIM: Towards Harmonised European Statistics on International Migration, Presses universitaires de Louvain, Louvain-la-Neuve.
  24. Raymer, J. and Abel, G. (2008) The MIMOSA model for estimating international migration flows in the European Union, Joint UNECE/Eurostat Work Session on Migration Statistics, UNECE/Eurostat, Geneva, (Working paper 8). http://www.unece.org/stats/documents/ece/ces/ge.10/2008/wp.8.e.pdf (accessed 12 January 2016).
  25. Rees, P. (1977) The measurement of migration from census and other sources. Environment and Planning A, 9, 257–280.
  26. Singleton, A., Wilson, A. and O'Brien, O. (2010) Geodemographics and spatial interaction: An integrated model for higher education. Journal of Geographical Systems, 14, 1–19.
  27. Stillwell, J. (1978) Interzonal migration: Some historical tests of spatial-interaction models. Environment and Planning A, 10, 1187–1200.
  28. Taylor, P.J. (1983) Distance Decay in Spatial Interactions, CATMOD, 2, Geo Books, Norwich.
  29. Williams, P.A. and Fotheringham, A.S. (1984) The Calibration of Spatial Interaction Models by Maximum Likelihood Estimation with Program SIMODEL, Dept. of Geography, Indiana University.
  30. Wilson, A. (1970) Entropy in urban and regional modelling, in Monographs in Spatial and Environmental Systems Analysis (eds R.J. Chorley and D.W. Harvey), Pion, London.
  31. Wilson, A. (1971) A family of spatial interaction models, and associated developments. Environment and Planning A, 3, 1–32.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.222.132