Adam Dennett and Alan Wilson
In this chapter, we show how methods of biproportional fitting – assembled through the use of entropy-maximising methods – can be used to generate estimates of missing data, and particularly flows, from partially complete sets of data. This enables us to generate inter-regional migration flows within Europe.
Understanding migration is one of the enduring challenges facing geographers and demographers worldwide. The challenge persists, thanks to the range of territories and geographical scales of interest, the difficulty in dealing with inconsistent definitions of migrants and migration events, the variable (and often poor) quality of data and the large and sometimes complex array of tools available. While an understanding of migration patterns and processes at the global scale presents possibly the largest challenge, in Europe we still know far less about the movements of people within the Union than may be expected given the continued desire for knowledge about population change and the amount of demographic data made available from member countries (Poulain et al., 2006). Acknowledging this, a number of recent projects have made attempts to address some of the limitations of (intra-) European migration data. Against a background of varying migrant definitions, inconsistent data relating to the same flows collected for origins and destinations, and incomplete matrices, the MIMOSA (Modelling migration and migrant populations) project (Raymer and Abel, 2008), produced a series of inter-country migration estimates for years between 2002 and 2006 through harmonising available data and using a multiplicative modelling framework to model flows between countries. Following on from this, the IMEM (Integrated Modelling of European Migration) project (van der Erf et al. – http://www.nidi.nl/Pages/NID/24/842.TGFuZz1FTkc.html) is currently looking to improve upon the methodology employed in MIMOSA through a Bayesian statistical approach. Further work has also been carried out by Abel (2010) who used a negative binomial regression (spatial interaction) model to estimate inter-country flows using a suite of predictor variables.
All of these projects have limited their scope to inter-country flows, but within Europe much of the focus of the EU commission is on regional policy (http://ec.europa.eu/regional_policy/index_en.cfm) which is intended to address the quite marked socio-economic disparities which persist between smaller zones within the Union. A recent project which had a partial focus on migration at the regional (Nomenclature of Territorial Units for Statistics level 2 – NUTS2) level in the European Union was the DEMIFER project (De Beer et al., 2010). One of the outputs from this project is a set of regional population projections for four different growth/cohesion scenarios which include a model of regional in- and out-migration based upon annual transition rates (Kupiszewska and Kupiszewski, 2010). While in- and out-migration rates tell us something about migration at the regional level within Europe, they reveal little about the interaction between regions and the hotspots of population exchange which occur within the Union helping drive the dynamism and evolution of local population structures. Indeed our knowledge of these exchanges across the whole Union is poor.
Within the United Kingdom, migration policy is rarely far from the headlines, although as Cangiano (2011) points out, there has been a certain disconnect between immigration policy and wider acknowledgement of demographic issues such as the ageing of the population. Compounding these macro policy problems, there is a local dimension to demographic issues and a current knowledge gap in relation to local immigration concentrations and emigration flows. The UK government has a limited capacity to predict or control the flows of EU nationals into the country and then where in the country they go once they have arrived; conversely knowledge of areas which are likely to experience increased pressures due to migration is vital for effective policy decisions to be made. Where these issues exist in the United Kingdom, we can be sure that similar issues are experienced in other EU member states.
Therefore, in this chapter, we propose a methodology for estimating the inter-regional flows which pose these particular local policy problems within Europe. The work builds on previous research which has made use of variations on the entropy maximising spatial interaction models (SIMs) first introduced by Wilson (1970, 1971) and used in migration research (He and Pooler, 2003; Plane, 1982; Stillwell, 1978). A new Multi-Level Spatial Interaction Model (MLSIM) is proposed which incorporates data at both country and regional levels in Europe to produce estimates of the inter-regional inter-country flows consistent with known information at these different levels. The heart of the method is biproportional fitting.
2006 is the year for which the maximum amount of migration data at all levels is available, and so we use this as our temporal base. The spatial system of 287 NUTS2 regions nested within 31 countries (EU 27 + Norway, Iceland and Switzerland – which will be referred to as the ‘EU system’ in this chapter subsequently) is shown in Figure 6.1. Migration data for some of the flows occurring are available. These data, along with cells representing missing data, can be visualised as an origin/destination matrix as shown for a sample of countries in Figure 6.2. The grey cells in Figure 6.2 represent inter-regional intra-country (internal migration) migration flow counts which are available for most counties in the system. Flows within NUTS2 regions (the white cells on the diagonal) are not included in this analysis. The internal migration data were collated for use in the ESPON-funded DEMIFER project (http://www.espon.eu/main/Menu_Projects/Menu_AppliedResearch/demifer.html), although in almost all cases, these data are freely available from the Eurostat statistics database (often referred to as ‘New Cronos’ – http://epp.eurostat.ec.europa.eu/portal/page/portal/statistics/search_database). Internal migration data for two countries – France and Germany – are not available on this database and were procured separately for DEMIFER from national statistical agencies. It should be noted, although, that while technically European NUTS2 zones, the French overseas departments of Guadeloupe, Martinique, Reunion and French Guiana are not included. The coloured cells represent inter-country flows. Consistent estimates of international (intra-Europe) origin/destination flows have been created for the 31 countries for our year of interest by Raymer and colleagues for the MIMOSA project (Raymer and Abel, 2008).
Missing data in this EU system matrix are the inter-country, inter-regional flows – for example, the flows from the three zones in Country 1 to the three zones in Country 3 which sum to the 4,856 migrants we know flowed between Country 1 and Country 3 in Figure 6.2. The modelling challenge, therefore, is to estimate this missing data in the matrix making use of information available at both the country and regional levels. The ultimate goal is to produce a full set of inter-regional estimates which make the most use of all available flow information at all levels within the system. Therefore, it will be necessary to understand the full range of the models which can be built from the elements of the migration system. In defining a suite of models, it will become apparent that some are more likely to produce better results than others in different data scenarios – the model which produces the best results in this current data scenario may not be feasible to use where less data exist, and so other less-optimum models in the family might produce the next best estimates given different data availability.
One question that arises from this challenge in the current context is whether it is feasible to treat this 287 zone EU system as a whole when it is the convention to make a distinction between ‘internal migration’ flows and ‘international migration’ flows. It could be argued that where national borders are real barriers to travel then two systems should be defined, however, in a post-Schengen Europe (Convey and Kupiszewski, 1995; Kraler et al., 2006) national boundaries are not the rigid constructs (both metaphorically and physically) they once were, with flows of migrants between member countries now (in principle) as easy as flows within them. Indeed it is not uncommon for another type of human flow – daily commutes – to occur between countries such as Denmark and Sweden or Luxembourg and Belgium (Mathä and Wintr, 2009). With this being the case, we might expect internal migration and international migration in these areas of Europe to be virtually interchangeable in terms of, for example, the motivations for moves or the limiting factors such as distance which curtail flows. Whether this is actually the case will be explored although the modelling experiments with different models in the family are detailed later in the paper.
To achieve the task set out in Section 6.1, we will make use of a variation on the doubly constrained entropy maximising SIM (Wilson, 1970, 1971). SIMs are particularly appropriate in the context of migration where empirical studies and model experiments have demonstrated that the propensity to migrate decreases with distance (Boyle et al., 1998; Flowerdew, 2010; Fotheringham et al., 2004; He and Pooler, 2003; Singleton et al., 2010; Stillwell, 1978; Taylor, 1983). Indeed, Olsson (1970, p. 223) notes that ‘Under the umbrella of spatial interaction and distance decay, it has been possible to accommodate most model work in transportation, migration, commuting and diffusion’.
If is the number of migrant transitions, (Rees, 1977), let capital letters such as and denote countries and let lower case letters such as and denote NUTS2 regions within a country. Then let be the number of migrants from country to country in some time period, say to (which we will leave implicit for ease of notation). Then we can denote by the number of migrants from region in to region in . For convenience we denote all the migration flows by , but the different subscripts and superscripts indicate the different geographical levels in the system. This notation implies that we number the NUTS2 zones from for country rather than numbering them consecutively for the whole system.
The available data described in Figure 6.2 can then be shown as in Figure 6.3. We have inter-regional, intra-country data for each country – where I = J. These internal migration flows could also be described with the notation to distinguish them from inter-country inter-regional flows. Intra-regional flows – – are not available. At the country level, inter-country flows are available.
The row and column totals are known for the elements, that is, at the NUTS2 level, and also for the inter-country levels. Let these be and and and , respectively, so that:
These row and column totals are depicted in expanded versions of Figures 6.2 and 6.3, as shown in Figure 6.4a and b. Note that the and totals do not include intra-country data contained in the and totals – consistent with the common practice of not including intra-country flows in international migration analysis. Internal migration data are assumed to be consistent such that:
The sample data shown in Figure 6.4a and b represent the information we currently have about our system of interest. The formulation thus far implies that we are not seeking to model flows at the NUTS2 level within each country (we have these data) and to and from other countries, . The ultimate modelling goal, however, is to estimate these inter-country regional level flows, effectively filling all interior cells in the matrix.
In order to model these NUTS2 level flows between countries, we introduce another element of notation: and are, respectively, the out-migration flows from NUTS2 in country to country and the in-migration flows to NUTS2 in country from country . and can be viewed as table sub-margins and are equivalent to and (where the country subscripts are dropped as flows are internal) so that
Then in (6.3) and (6.4), for , would be given by
These sub-margin elements are shown in Figure 6.5a and b. In addition to these new sub-margins, two new row and column margins can also be calculated. and are directly related to and in that:
A final set of margins can be calculated for all interior cells in the matrix where
With a complete system description, we can then consider the variety of models which can be built. Equations (6.1)–(6.4) (6.6), (6.7), (6.9), (6.11), (6.13) and (6.14) can provide the core constraint equations for a suite of entropy maximising models, which can be used to estimate various elements and aggregations of the flows in the multi-level system matrix. We might describe this as a family of MLSIMs, with the model possibilities being the following:
If the accounting equations (6.1)–(6.4) are deployed as in Models (i) and (ii), this leads to the construction of doubly constrained models for which the main task would be to identify impedance functions, associated generalised costs , and the model parameter values. In migration, research cost is often the physical distance between places: the propensity to migrate decreases with distance and thus the cost of travel can be inferred to increase. Empirical studies have shown that this distance decay in migration propensity will often follow either a negative exponential or inverse power law (Stillwell, 1978). In SIMs, this is represented by a parameter , (normally negative), which can be calibrated endogenously if data exist. In the equations which follow, we write the distance decay function , as exponential – – although it would be just as appropriate to write it as a power law – .
Model (i) is the most straightforward and would produce
where the generalised distance decay parameter can be calibrated endogenously using data. An alternative version of this model could calculate origin or destination-specific parameters:
The inter-country Model (ii) would be
where balancing factors are calculated with equivalent equations to (6.16) and (6.17).
The asymmetric models in Model (iiia) would take the form
With the balancing factors for (6.21):
and the balancing factors for (6.22):
Equations (6.21) and (6.22) can be visualised easily by collapsing the matrices in Figure 6.5a and b into just the relevant margins and sub-margins (Figures 6.4a and b, 6.5a and b). These margins then become, effectively, the values in a standard two-dimensional matrix.
It is important to note that while in the examples in Figures 6.6a and 6.7a, corresponding country to country sums are equal – for example – as they should be, in Model (iiia) the modelled values will not correspond in this way, due to the constraints used. To exemplify, consider Figures 6.8 and 6.9. The marginal values in these figures are almost identical to those in Figures 6.6a and 6.7a (only two migrants are misplaced in Figure 6.8). The interior and values are quite different. In these modelled matrices, . For example, the total flows from Country 1 to Country 2 in Figure 6.8 are 6,915, whereas the total flows from Country1 to Country 2 in Figure 6.9 are 7,776. The reason for this is that the and flows are only constrained to the marginal totals – either and or and , respectively. In these models, and have multiple equilibria, only a small number of which result in . This has implications for Model (iv) in our suite of models.
Model (iv) takes and as constraints, with the doubly constrained version of the model defined as
With the balancing factors for (6.27):
If , then it is possible to solve Equations (6.27) and (6.28) – the iterative procedure which calculates that the and balancing factors are able to converge when and its corresponding sub-margin are the same value. If and values are estimated using the entropy-maximising procedure described in Equations (6.21) and (6.22), then , meaning that the iterative balancing factor routine will not converge and Equations (6.27) and (6.28) cannot be solved.
One solution to this issue is to estimate and using a method other than the entropy-maximising model described. As already noted, and are equivalent to and . In this system, we already know the values of and from the internal migration data available. Given this information, the following equations can be used to estimate and :
where these and estimates are constrained to the corresponding values, , and thus it is possible to solve Equations (6.27) and (6.28).
There is, however, an entropy-maximising solution to this issue as well. In Model (iiib), the constraints used to estimate and are not the matrix margins as shown in Figures 6.6 and 6.7. By using these margins in Model (iiia), we are not taking advantage of all known information in the system. As flows are known, a combination of matrix margins and known interior values can be used as constraints, thus the equations for and become
with the balancing factors for (6.33) calculated:
and the balancing factors for (6.34):
In constraining and to flows, . This means that when Equations (6.33) and (6.34) are used as inputs into (6.27) and (6.28) in Model (iv), the balancing factors will always converge and the equations can be solved. Model (iv) represents the estimates which will adhere most closely to the known information about the system, and as such might be described as the optimum model for the EU system in this study.
If Model (iv) is the optimum model, then Models (v) and (vi) which produce alternative estimates using less information might be described as being suboptimal. Model (v) will only produce estimates where . This model can be written as
where
In this model, and can be estimated in exactly the same way as and in Equations (6.31) and (6.32) so
The estimates in Model (v) will not adhere as closely to known values as those in Model (iv), as the constraints are the outer margins on the expanded matrix shown in Figure 6.5.
Finally, Model (vi) models the whole matrix, including flows. This model (with an origin-specific distance decay parameter) takes the form
where
with the and constraints calculated as in Equations (6.13) and (6.14).
This new family of doubly constrained MLSIMs allows estimates of a full matrix of 287 × 287 flows within the defined European system to be made. While Model (iv) defined in Equations (6.33) and (6.34) will produce estimates which are forced to adhere most closely to the known information in the system, other models in the family, which by definition will produce results constrained to less information, will allow us to examine features of the European migration system which do not fit our model assumptions. In doing this we might, for example, be able to identify areas where it would be prudent to adjust the cost proxy in order to distribute migrant flows more effectively within the system without the ‘helping hand’ that constraints give, or indeed answer the question posed in the introduction to this chapter relating to whether it is feasible to treat the European system as, effectively, an internal migration system where national boundaries have little influence on migration flows. First, however, a number of technical challenges relating to the implementation of the models need to be overcome.
All of the models described in the MLSIM family make use of a calibrated distance decay parameter (or parameters), but in making use of such a parameter, a number of problems present themselves. Firstly, calibration can only be carried out using known data within the system – therefore, the parameter(s) will have to be calibrated using either flows of flows. This means that, potentially, these parameters may not be completely appropriate for flows. In the absence of other means of estimating appropriate parameters, however, it could be argued this is the best option available at this time, and so it is the option we will have to take.
Accepting that available observed data will be used to calibrate the best-fit parameter(s), the next issue relates to the method used to carry out the calculation. Distance decay parameters in SIMs have historically been calibrated using maximum-likelihood techniques employed in computer algorithms – these commonly use iterative procedures to search for the ‘best-fit’ between the estimates created by the model and the sample data. As an aside, while standard iterative procedures are most frequently used in this type of modelling, it should be noted that a significant amount of work has been carried out by Openshaw and colleagues on the calibration of SIMs using genetic algorithms (Diplock and Openshaw, 1996; Openshaw, 1998): an approach perhaps operationalised most recently by Harland (2008) – we will not explore these methods here, but will use a conventional iterative approach. Batty and Mackie (1972) discuss a range of maximum-likelihood calibration methods, but the Newton–Raphson search algorithm has been shown to perform better than most and has been adopted in both the SIMODEL computer program developed by Williams and Fotheringham (1984) and the IMP program developed by Stillwell (1978); both Fortran programs using the search routine to find the parameter estimates which minimise the divergence between the mean value of the total distance travelled in the observed and modelled flow matrices – an approach also used by Pooler (1994). Thanks to its successful implementation in SIMs for migration analysis, the Newton–Raphson algorithm is the one that we choose to use here.
Initially two versions of the doubly constrained model were run to calibrate a best-fit general distance decay parameter for the whole system. The results of these models are shown in Table 6.1 and are contrasted with a more basic singly constrained model for comparison. Here, a selection of goodness-of-fit (GOF) statistics are displayed – the coefficient of determination (R2), the square root of the mean squared error (SRMSE), the sum of the squared deviations and the percentage of misallocated flows – although they all display very similar findings. It is clear that the doubly constrained model with the inverse power function applied to the distance matrix produces the best fit to the original data, with an R2 of some 87%. This compares to an R2 of 72% for the negative exponential function and 62% for the reference production constrained model.
Table 6.1 Goodness-of-fit statistics for model experiments
Model equation | R2 | SRMSE | Sum Sq Dev | % Misallocated | |
−4.2986 | 0.718 | 39.393 | 10,456,839,051 | 21.554 | |
−0.9136 | 0.865 | 27.992 | 5,280,098,085 | 17.008 | |
−1.2201 | 0.623 | 45.457 | 13,886,764,628 | 28.131 |
The question that follows is: should this overall distance decay parameter be used as the distance decay input to the estimation model? If this parameter is representative of the whole system, then it could be argued that it could. To test this, a model with an inverse power distance decay function (akin to that in the second row of Table 6.1) was run separately for each of the 21 countries in the system comprised of more than a single zone in order to calibrate a series of parameters. The results of these experiments are shown in Table 6.2.
Table 6.2 Goodness-of-fit statistics for inter-regional migration data modelled with a doubly constrained model with a power distance decay β parameter
Country code | Country | R2 | (power function) |
FI | Finland | 0.996 | −0.754 |
SE | Sweden | 0.974 | −0.771 |
AT | Austria | 0.972 | −0.747 |
HU | Hungary | 0.963 | −0.567 |
SK | Slovakia | 0.948 | −0.773 |
NL | Netherlands | 0.936 | −1.279 |
DK | Denmark | 0.930 | −0.969 |
NO | Norway | 0.919 | −0.814 |
BG | Bulgaria | 0.901 | −0.825 |
CZ | Czech Republic | 0.889 | −0.807 |
UK | United Kingdom | 0.884 | −0.927 |
PL | Poland | 0.877 | −1.068 |
CH | Switzerland | 0.788 | −0.867 |
BE | Belgium | 0.772 | −1.049 |
RO | Romania | 0.745 | −0.763 |
DE | Germany | 0.715 | −0.760 |
IT | Italy | 0.699 | −0.718 |
ES | Spain | 0.621 | 0.154 |
FR | France | 0.549 | 1.093 |
In this instance, we chose the inverse power distance decay function as it was the best-performing function in the experiment. Serendipitously, the power function is scale independent whereas the exponential function is not (Fotheringham and O'Kelly, 1989), meaning we are able to directly compare the parameters directly. In Table 6.2, we use the R2 value as our measure of goodness of fit. We are aware that there has been some debate over which is the most appropriate metric to use (Knudsen and Fotheringham, 1986); however, R2 is commonly used and for comparative proposes, the choice of statistic has little relevance to the outcome. A number of points can be made about the results displayed in Table 6.2. Firstly, the countries are ranked according to their goodness of fit and we can observe that around half of the list have R2 values over 90%, with Finland, Sweden and Austria ranked the highest – Finland with an exceptionally high R2. It is clear, however, that there is a considerable variation in the parameters for each country. This would suggest that it may not be ideal to use the generalised parameter to model flows for the whole EU system. Furthermore, the reliability of some of the parameters can be called into question with particularly low R2 values for Spain and France – countries which exhibit positive parameter values. The exact way in which these parameters can be understood has been questioned (Fotheringham, 1981); however, one interpretation is that the value can be read behaviourally and the number is an index of the deterrent to migration, with high negative values representing distance being a strong deterrent to migration and low negative values inferring that distance is a weak deterrent. Positive values in this context would indicate that distance is an attraction to interaction – that is, the further away origins and destinations, the more likely migration is to occur. Clearly this is unlikely to be the case across the whole of Spain and France.
Given this evidence, generalised distance decay parameters are currently poor candidates for inputs into an estimation model for the whole of Europe. A potential solution, therefore, would be to use distance decay parameters which are specific to each NUTS2 zone – a technique first outlined by Stillwell (1978). This returns us to Model (i) and Equations (6.18) and (6.19).
The GOF statistics for Model (i) – taken for all internal migration flows in the system rather than for each separate country) – are shown in Table 6.3. Evidently, these models provide much better fits than the generalised parameter models, with R2 values around 93%. A geography to these distance decay parameters can be observed, with the frictional effects of distance operating very differently for in- and out-migration flows across the EU system, as is shown in Figures 6.10 and 6.11. It should be noted that the nature of the algorithm used to carry out this calibration means that where it is not possible to calculate a zone-specific distance decay parameter (e.g. in those countries where data do not exist such as Greece), a generalised distance decay parameter which is calculated for the whole system prior to zone-specific calibration is allocated. Given the results of these experiments, it is these origin and destination-specific parameters calibrated on internal migration data which will be used as distance decay inputs into our later estimation models.
Table 6.3 Goodness-of-fit statistics for Model (i) with and parameters
Model equation | R2 | SRMSE | Sum Sq Dev | % Misallocated |
0.928 | 19.802 | 2,642,462,153 | 12.284 | |
0.931 | 19.582 | 2,583,959,209 | 12.163 |
The first step is the estimation of margin constraints. In the section of the MLSIM family of models outlined in Section 6.3, which used to estimate flows, all require some inputs which are not available directly from the data to hand. In addition to the distance decay parameters that will be calibrated only on internal migration data, Models (iiia), (iiib), (iv), (v) and (vi) make use (directly and indirectly) of and margins. Consequently, sub-models are required to make estimates of these data. When and , it follows that it should be feasible to estimate the NUTS2-level and margins from the country-level and margins, given the appropriate ratio values. But which are the appropriate ratios to use?
As information at the internal migration level is complete, it might be possible to use the distribution of internal migrants to estimate the distribution of international migrants such that
The assumption here is that the distribution of internal in- and out-migrants within countries is the same as the distribution of immigrants and emigrants moving between countries. But can internal migrant distributions be used to estimate distributions of international migrants within countries accurately? We might expect, for example, capital cities to dominate these distributions with larger urban areas also providing significant origins and destinations at both levels. Is this the case in reality? Figure 6.12 shows the comparable distributions of internal and international migration for a selection of European countries at NUTS2 level (all countries where comparable data exist at this level), taken from Census data from the 2000–2001 census round and compiled by Eurostat. Broadly speaking, there are positive correlations between internal and international migration distributions, although there are some noticeable differences in the correlation coefficients denoted by the R2 values (and the scatter plots). For most countries in the selection, R2 values are over 80%, indicating that internal migration distributions are reasonably good predictors of international migration distributions. For some countries, however, this predictive relationship is weak. Poland, for example, has an R2 value of only 17%, with the Czech Republic (23%) and Switzerland (28%) not faring much better. The reasons for the lack of correlation in these countries are difficult to ascribe, but differences in the perceived attractiveness of particular destinations to internal and international migrants will affect the correlations. Studying Figure 6.12, the scatter plots show that there is very little pattern in the association between internal in-migration and immigration in Poland, although examining Switzerland and the Czech Republic, it appears that were it not for one or two outliers in the scatter plots, the correlation would be far stronger. Through mapping the differences between internal and international migrant distributions, it is possible to interrogate these and other outliers a little further.
Figure 6.13 maps the distribution of the differences between the regional shares of internal and international (in-)migration across NUTS2 zones in Europe (where data are available). A number of points should be made about this map. Firstly, all yellow zones signify less than a 1% deviation between the distribution of internal and international migrants – these zones include much of the United Kingdom and large parts of France, Italy, the Czech Republic, Poland and Greece. In these areas, internal migration distributions can be seen to be good predictors of international migration distributions. Secondly, zones in light orange and light green show only up to a 3% deviation – these include most of the rest of France, a number of regions in Scandinavia, the Netherlands, Poland, the United Kingdom, Italy and Greece. Perhaps the most important point of note, however, which becomes very apparent when examining Figure 6.13, is that there appears to be a ‘capital city effect’. The regions containing London, Paris, Madrid, Rome, Amsterdam, Stockholm, Helsinki, Prague, Lisbon and Dublin all exhibit a noticeably higher (average 8.7%) proportion of the national share of international immigrants compared to the national share of internal in-migrants. Some capital cities go against this pattern, although Bern can probably be discounted as in terms of city status, Zurich (which matches this trend) could be argued to be a city of more importance within Switzerland. Oslo, Athens and Budapest have lower proportions of international immigrants than internal in-migrants, but the city region where a very large trend in the opposite direct occurs is Warsaw in Poland. Here, the proportion of internal migrants to Warsaw is over 18% higher than the proportion of international migrants. The fact that Warsaw is an attractive destination for internal migrants would not be surprising, but why it accounts for a much larger proportion of these migrants compared to international migrants is unclear without further investigation of the particular motivations of migrants in Poland.
Based on this, it could be argued that if this capital city effect could be accounted for consistently, and the proportions of migrants associated with other regions in the country adjusted accordingly, then internal migration distributions could be used to make international migration margin estimates relatively reliable, assuming that these associations hold over time.
Incidentally, the time dimension provides us with another option for modelling the sub-national distributions of international migrants. Where decennial census (or other periodic) data can provide sub-national immigrant distributions, if country-level immigrant data are available, sub-national distributions can be estimated with the formula:
Even if more up-to-date national data are not available, an assumption could be made that these ratios hold over time so that
Returning to Equations (6.47) and (6.48), unfortunately the nature of the data collated by Eurostat means that it is not possible to assess whether emigrant distributions also follow the distributions of internal out-migrants (these data are census/population register data relating to resident populations in recording countries and therefore cannot contain emigrant data). Given the high degree of association between internal migration in- and out-migration distributions (Figure 6.14), it might be reasonable to use international immigrant distributions to estimate international emigrant distributions, but the capital city effect would need to be explored before this could be done with confidence. Here our concern is to present a general methodology for estimating the full EU matrix of NUTS2 flows and so we will not dwell on this element of the estimation process at this stage, although it should be stressed that the estimation of and marginal values will have an important bearing on reliability of the final modelled outputs.
As a consequence of the data to hand and the investigations of internal/international migration associations, at this stage internal migration distributions will be used to estimate and marginal values for the model as in Equations (6.47) and (6.48), but we recognise that this is an area of the methodology which could be improved in the future.
In a file containing the full suite of Models (iv), (v) and (vi), and estimates are publicly available for anyone wishing to make use of the data through the following link:
Model (iv) takes in and inputs from Model (iiib) and can be viewed as the optimum model as any outputs will be constrained to known flows and estimated and margins (where and estimates used these constraints). Models (v) and (vi), in contrast, are suboptimal as estimates will not be constrained to flows, only and , or and margins. Running suboptimal models is an important part of the model-building process as they allow us to explore the reliability of some of the general model assumptions; however here, we present just the results from the optimum Model (iv) of the family of MLSIMs.
MLSIMs offer the opportunity to examine inter-regional flows between all countries in our chosen EU system – examining all flows or even all significant flows would be an extensive task; therefore, we will take the United Kingdom as exemplification. Figure 6.15 depicts all flows over 200 persons entering UK regions from other EU regions, and it is clear that particular origin and destination combinations predominate. Firstly, the importance of London and the South-East corner of the United Kingdom is very apparent – nearly all flows are concentrated in this area, with only a small number entering regions containing other large cities such as Manchester and Birmingham. A large number of these flows originate in Polish regions and many terminate in London. Interestingly, flows from Poland into East Anglia, which have gained much media attention in the United Kingdom are picked up by the model, despite other explanatory factors such as increased job opportunities in the agricultural sector not taken into consideration by the model. One small caveat in relation to these flows can be made referring back to our observations about the poor relationship between internal and international migration distributions in Poland made in Section 5.1. Where the relationship between these flow distributions is poor in Poland, some of the precise flow volumes originating from these Polish NUTS2 regions should be treated with caution. Where these relationships are stronger in France and Spain, the large flows from other major capital cities such as Paris and Madrid can be viewed more reliably, indeed given the ‘capital city effect’ also noticed, these flows may even be larger in reality. High-volume flows are also noticeable from Cyprus, although these may well be associated with the movement of armed forces.
Examining the flows out of UK regions to the rest of the EU system (Figure 6.16), the South-East – and especially London – predominates as with immigration. Destinations for migrants leaving the United Kingdom are quite different to the origins for those arriving in 2006. The large volumes of migration (we may assume related to retirement) can be observed flowing into Spanish regions – regions including the largest cities of Madrid, Barcelona and Valencia, as well as the Costa del Sol. Large flows can also be observed from London and other regions of the United Kingdom into Ireland – this is partially a function of Ireland consisting of only two regions and so these flows appear more concentrated, although the close ties between all countries of the United Kingdom and Ireland mean that these flows are entirely expected.
In this chapter, we have introduced a new family of models for estimating inter-regional migration flows in Europe. Our guiding principle was a simple one – to make use of the maximum amount of available data (embodied in the constraints imposed within the model and the parameters used to influence the patterns) to produce estimates of the maximum likelihood given the information available.
The estimates produced by Model (iv) represent the current ‘best-guess’ given the data to hand. They embody all known information about flows into and out of countries, the behaviour of internal migrants within their home countries and the relationships between the destination preference of internal and international migrants. There are, of course, a number of areas where these estimates could be improved. Firstly, the country-level international migration data constraints are themselves estimates. The data used were taken from the MIMOSA project (Raymer and Abel, 2008) – data which the authors recognise the limitations of and which will soon be superseded by improved estimates from the IMEM project mentioned in the introduction. When these model inputs can be improved, then there will be a knock-on improvement to our own estimates. We have already acknowledged that there are issues with the methodology we employed to estimate the and matrix margins which formed constraints either directly or indirectly for all models. As outlined, in these estimates, we have simply taken the national distributions of internal migrants to distribute international migrants. While there are high correlations between these distributions for in-migration, demonstrated across Europe from Census and register data, a ‘capital city effect’ persists where these destinations can attract up to 10% more migrants internationally than internally. Furthermore, we have been unable to ascertain whether a similar situation exists for out-migration flows. Finally, in using distance decay parameters calibrated with internal migration data, we could be introducing error where internal migration flows, even in an open border Europe, act very differently to international flows. Experimentation with suboptimal models which are not reported in this chapter suggest that this might be the case, with country border effects far stronger than the in- and out-migration constrained models estimate.
Model (vi) constrains inter-regional estimates to known (estimated) inter-country flows allowing us to explore the likely inter-regional international flows within Europe. This is an important development as for the first time we are able to examine, at a much higher resolution than previously possible, pressure points within the migration system. Not shown in this chapter, but evident in the results which are available through the web link given at the beginning of Section 6.6, are the regions in central and southern Spain which are likely destinations for the large influx of migrants from Romania, along with the areas of Romania which are equally as affected (if not more) socially, demographically and economically by these large flows of people. In the United Kingdom, we have shown the localised concentrations of migrant flows particularly into London, the South-East and East Anglia, and especially from regions in Poland.
While even in this optimum model, there are improvements that can be made. Now the modelling framework is in place, when improved inputs can be supplied to the model, then improved outputs can be very easily achieved. In this chapter we have concentrated on 2006, but data (albeit less comprehensive) for other years exist, and so a natural extension to this work would be to explore the temporal dynamics of particular sets of inter-regional flows in the system. Furthermore, in this analysis, we have chosen Europe to exemplify our models, but clearly we need not be limited to Europe – the model can easily be applied to estimate sub-national flows in a global context, opening up exciting possibilities for a more complete global sub-national understanding of migration. Implementation of a wider spatial system and broader temporal base means that the model framework introduced in this chapter should provide a useful tool for policy decisions related to demographic trends both in Europe and further afield.
3.144.222.132