Chapter 7
Estimating an Annual Time Series of Global Migration Flows – An Alternative Methodology for Using Migrant Stock Data

Adam Dennett

7.1 Introduction

Globally, international migration flow data are poor. Putting aside issues of data quality and definitional nuances for one moment and concentrating just on coverage, it is fair to say that little is known of the annual flows of people between the circa 250 countries which make up the political map of the world. If we imagine a large two-dimensional data matrix with each of the world's counties heading the rows (as origins) and columns (as destinations), with the interior cells of the matrix containing counts of migrants moving between each country over the course of a year, then for any random year, the chances that there are non-zero data filling even a small proportion of the cells will be low.

Even if the matrix happens to be representing flows in the first or second year of a decade when, curiously, many of the countries that still carry out a full population census do so almost in synchronisation with each other, coverage would be patchy. The matrix may be slightly more populated with data than in other years, but it would very likely be characterised by vertical lines of data with far fewer horizontal lines – this is because Censuses are quite good at counting people into countries, but rather poor at counting people out (it is hard to count someone after they have left).

Some sections of the matrix, even in a non-census year, would be rather more populated with data than others. In the section containing the countries of Europe, it would be relatively complete across a number of years. National statistical agencies in EU countries have been bound by legislation for sometime now to supply the European statistical agency, Eurostat, with estimates of immigration. Many of the previous empty cells in this section of the matrix have been filled by various academic projects (Abel, 2010; Raymer and Abel, (2008a,b); de Beer et al., 2010; Raymer et al., 2013; Wiśniowski et al., 2013). These projects have been successful in both harmonising immigration and emigration estimates between countries (estimates which, among other things, vary due to definitional differences and differences in the methods or systems used to record these moves) and estimating flows between countries where data are incomplete.

Flows into or out of some of the more economically developed countries in the world (some of the matrix margins rather than the internal cells) will also be a little more detailed. Members of Organisation for Economic Co-operation and Development (OECD), for example, supply the organisation with information on total inflows and/or outflows of migrants and have been doing so for a decade or so.

Some sections of the matrix would be particularly sparse and not just because there are some combinations of countries between which people do not often migrate (e.g. those with small populations), but because detailed censuses, population registers, re-purposed administrative data or border crossing surveys are simply not available to provide estimates of the true flows that take place. Compounding this lack of statistical infrastructure, parts of the world such as those recently ravaged by war or civil unrest experience a frequent ebb and flow of refugees and more permanent migrants across porous borders which are exceedingly hard to measure accurately.

Observing this situation from afar, one may be forgiven for thinking that the poor state of international migration flow data must be indicative of a lack of importance – surely governments across the world, if they were really interested in these flows, would put in place better systems for capturing population movements? But can this really be the case? Among countries in Europe, North America and Australasia, issues such as immigration are rarely far from the top of the political agenda, indeed a recurrent news item in the United Kingdom will be the latest cry from a different Member of Parliament, pressure group or think tank that the migration statistics in the country are ‘not fit-for-purpose’.1 While the concerns voiced aloud by politicians tend to be responses to voter anxieties about jobs, or posturing in relation to those who are seen as deserving (or not) of the social benefits, some of the less politically motivated worries about immigration can be traced back to the pragmatic concerns of planners who, understandably, want good information about the populations residing in different parts of a country so that service provision can be met effectively. While births and deaths can be recorded quite accurately, it is the volatility of migration and the quality of the statistics that can cause the largest headaches (UKSA, 2009).

In this context, it is perhaps surprising that very little work has been carried out to try and populate the sparse matrix of global migration flows, although the parochial nature of policy-making means that, in the main, what is happening in the rest of the world is of little immediate importance to those making national decisions. Even if local statistics are tackled, the collective will from politicians to address the global lack of data appears to be, at best, weak. Outside of official statistics and an apparent lack of governmental interest in global migration flows, academic interest has been not much more buoyant. Notable exceptions to this are the recent efforts by Abel (2013a,b) who has attempted to generate a series of 5-year migration flow estimates for the whole world by comparing changes in bilateral (origin/destination) migration stock data. Migrant stocks can be viewed as running totals of migration flows and as such are directly related. These stocks are recorded as counts of people who are either born in another country or citizens of another country; the latter can change with naturalisation (or less frequently denaturalisation) and so country of birth is often the more straightforward measure to use – indeed stocks are counts of foreign-born people in about 80% of the countries that collect these data.2 Changes in these stocks (after accounting for deaths – and births in the case of foreign national stocks) are due to migration flows. Therefore, by comparing the stocks of migrants born in different countries at 5-year intervals, Abel (2013a) is able to generate estimates of the migration flows between these countries for 5-year time periods.

Abel's method is innovative and makes intelligent use of stock data which are published by the World Bank. In order to validate the effectiveness of the model employed, his model estimates are compared with the published net migration data. These comparisons show encouraging correlations, but as vastly different in- and out-migration data can produce the same net flow, there may be problems with validating against such statistics. As such it is possible, even with good net migration correlations, that poor flow estimates are generated. The reason Abel chooses to use net migration to validate his model is because other data are not available for the whole system – hence the existence of the problem in the first place. But flow data do exist for sections of the global matrix so it is possible to carry out a partial validation on the gross flows rather than the net – this is exemplified below.

The best inter-country migration estimates available at this time come from the IMEM project3 (Raymer et al., 2013; Wiśniowski et al., 2013). This work is an evolution from the earlier work of the MIMOSA4 project, which produced similar, but less accurate, estimates of the same intra-European flows. These annual flow estimates are for the years 2002–2008 which, unfortunately, cannot be aggregated to cover either the 2000–2005 or 2005–2010 periods Abel uses. However, whilst a little crude, dividing Abel's estimates by 5 gives an approximation of the estimated single-year flow volume which can be compared with each of the IMEM migration years. Now the IMEM data are not without issues and so any disagreement should not be taken as definitive evidence of a problem or otherwise, but they certainly may give an indication of possible complications in Abel's estimation process.

Table 7.1 shows that the estimates generated using Abel's method are some way away from the IMEM estimates for the 29 countries in the European system. Total annual flow volumes (around 400,000 between 2002 and 2008) are a long way from the figures of between 1.2 and 1.9 million estimated by IMEM. More importantly, the correlations represented by overall coefficient of determination (R2) values for the 841 origin/destination interactions are low (2002 and 2007 chosen as close to the mean year for each 5-year period). The R2 value gives an indication of how close Abel's estimates replicate the overall structure of the IMEM data – the variations in the large and small flow volumes between different origins and destinations – and it shows that they are not very close.

Table 7.1 Comparison of Abel's 5-year migration flow estimates with IMEM data

Year IMEM data Abel's 5-year estimates/5
Total R2
Total migrants 2000–2005 2005–2010 2000–2005 2005–2010
2002 1,246,762 407,792 0.24
2003 1,257,237 407,792
2004 1,453,802 407,792
2005 1,494,601 407,792 419,727
2006 1,639,059 419,727
2007 1,849,528 419,727 0.15
2008 1,860,165 419,727

It is clear then that there is scope to improve upon the estimates of global migration flows that are presently available in terms of both their temporal resolution (annual estimates would be preferable to 5-year transitions) and their distribution. Taking the latter, we can observe that while Abel makes use of bilateral migrant stock data collated by the World Bank/United Nations, his method employs an inverse distance function in order to distribute flows estimated from differencing stocks and disregards perhaps the most valuable information contained within: the large hints towards likely origin/destination interactions contained within the accumulation of historical flows. Using distance to distribute migration flows within a spatial interaction model is, of course, well established (Flowerdew and Lovett, 1988; Willekens, 1999; Stillwell, 1978; Fotheringham et al., 2004), but distance is only ever an imperfect proxy for the observed distribution of flows. Where an empirical record of historical flows exists, as it does with the bilateral stock data, then it should be possible to achieve far more accurate estimates using this as a distribution function. In his introduction, Abel discounts the use of stock data due to the comment from Massey et al. 1999 that migrant stock data can ‘yield a misleading portrait of the current migration system’ (Abel, 2013a, p. 507) as migration flows are far more volatile. But by not using the considerable amount of evidence from historical flows that are contained in stocks as to the potential preferences of migrants, then we are really ignoring some absolutely vital data. Distances between countries do not change at a perceptible rate (certainly not over the century or so at most that modern migration scholars are likely to be interested in) and so are even less flexible as a distribution term than stocks. Massey's assertion holds even less water when we look at the evidence. Studying Figure 7.1, there is a clear (log) linear relationship between bilateral stocks (as a proportion of all stocks) and bilateral flows (as a proportion of all flows) in the European system (where we have data to carry out such a comparison). As such, it is reasonable to assume a similar proportional relationship exists between stocks and flows in the rest of the global system, and this relationship can be employed to achieve more accurate estimates of global flows.

nfgz001

Figure 7.1 Comparison between bilateral stock and flow proportions for European countries R2 = 0.794

7.2 Methodology

7.2.1 Introduction

The following section will detail a new methodology for taking advantage of the structure inherent in migration stock data to produce an alternative set of global estimates. The method is very straightforward and makes a number of crude assumptions, but despite this, the outcomes are still an improvement on current best estimates.

7.2.2 Calculating Migration Probabilities

The World Bank publishes bilateral migrant stock data for the years 1960, 1970, 1980, 1990, 2000, 20105 – these are data collated from censuses and population registers and represent total stocks at these years, primarily for foreign-born migrants (Özden et al., 2011). The first stage of the new estimation process is to calculate bilateral migration rates or probabilities relative to the total stock population:

7.1 equation

where c07-math-0002 is the probability of being born in country c07-math-0003 and currently residing in country c07-math-0004 at a particular time c07-math-0005, which is the ratio of the stock of migrants born in country c07-math-0006 and currently residing in country c07-math-0007, at this time c07-math-0008, to the total global stock c07-math-0009 of migrants residing in a different country to their birth:

7.2 equation

where the probability of a migration flow between country c07-math-0011 and country c07-math-0012 in any given year, c07-math-0013 can be calculated in a very similar way such that

7.3 equation

where c07-math-0015 is the observed migration flow (transition) between origin country c07-math-0016 and destination country c07-math-0017 in year c07-math-0018 and c07-math-0019 is the total of all migration flows in the system at this time:

7.4 equation

If we assume that the relationship in Figure 7.1 holds for the whole global system, then we can say that it is approximately the same as the cumulative historical flow probability captured by bilateral stock relationship c07-math-0021

7.5 equation

Therefore, if we are able to calculate or estimate the total number of migrants moving around the world in a given year, then it is elementary to estimate bilateral migration flows, c07-math-0023, from the stock probabilities:

7.2.3 Calculating Total Migrants in the Global System

Equation (7.6), in effect, is equivalent to the unconstrained (or total constrained) spatial interaction model proposed by Wilson 1971, but it does of course leave the non-trivial problem of estimating c07-math-0025. One obvious solution would be to collate data on immigration or emigration. The widely available net migration statistics for countries would suggest that these data are in existence, as conventionally net migration data are calculated as immigration minus emigration. Consider Table 7.2 that represents a sample migration system:

Table 7.2 Sample migration flows between countries in a three-country system

c07-math-0026 Country 1 Country 2 Country 3 c07-math-0027 c07-math-0028
Country 1 0 120 300 420 −295
Country 2 50 0 92 142 178
Country 3 75 200 0 275 117
c07-math-0029 125 320 392 837
c07-math-0030 295 −178 −117

In this system with total immigration (c07-math-0031), emigration (c07-math-0032) and represented as the inner set of matrix margins (superscript for time, c07-math-0033, is implicit), net migration (c07-math-0034 and c07-math-0035) is calculated such that in this closed system:

7.7 equation

where

7.8 equation
7.9 equation

and which also means that

7.10 equation

where we have a full and complete data set:

7.11 equation
7.12 equation

As such

7.13 equation

In demographic accounting, the net migration balance is conventionally calculated as for c07-math-0043 (in-migration minus out-migration) rather than c07-math-0044. However, where data on in- or out-migration are unavailable or unreliable, net migration is also calculated as the residual difference between the overall population growth rate and the rate of natural increase (births minus deaths) (UN, 2013). In the latest revision of the UN global net migration estimates,6 of the 201 countries for which net migration data are available, 75% (148) have their statistics calculated in this way.

Now this, of course, poses a problem if we are hoping to use immigration or emigration statistics to calculate the total number of migrants – the relative abundance of net migration statistics belies this lack of immigration or emigration information. As is evident in Table 7.2, the positive or negative net balance of migrants will always be some way short of the total migrants – although the extent of the undercount is unknown. The impression from Table 7.2 is that the positive or negative net balance (295 migrants) is quite a long way short of the total migrants in the system (837). But this stylised system has migrants flowing in both directions from each country in a relatively balanced way – in reality, for most bilateral country interactions, net migration balances will be asymmetric, especially where flows are between the Global South and countries in the more economically developed regions of the world.

So the question is: how asymmetric are the flows? Are they asymmetric enough for it to be reasonable to use net balance data as a proxy for total migration flows in our system? We know that the net balance will always be an underestimate of the true number of migrants, but it is, at least, an empirical observation. If the underestimation is too severe, then it will not be practical to use. Figure 7.2 gives us some clues as to the level of underestimation by comparing the positive net migration balance between 1965 and 2010 with the total migration estimated by Abel as the difference in foreign-born migration stocks (adjusted for deaths) between two successive time periods.

nfgz002

Figure 7.2 Comparison between total global migration flows (5-year periods) as estimated by Abel (2013b) and UN net migration positive balance (5-year periods), 1965–2010

Figure 7.2 shows the relationship between the UN net migration data and the total migrant estimates derived from Abel's (2013b) flow estimates. There is clearly an association over time between the two sets of statistics, which is broadly linear. Both totals increase at roughly the same rate over the period, and there are also similarities in the flattening out in terms of growth between 1980 and 1990, as well as a similar outlier in 1995. The difference between the two total estimates remains at around 10 million migrants between 1965 and 2005, with the absolute difference only really departing from this gap in 2010 when the difference is closer to 20 million. One interesting outcome of plotting these two data sets against each other is that we can get a sense of the asymmetry of net migration balances over time. The closer the two statistics for a given year, the more asymmetric the flows in the system (as there is less cancelling of immigration with emigration) – the further apart, the less asymmetric (i.e. people tend to be migrating to and fro from countries in both directions). As we can see in Figure 7.2, the gap is widening slightly over time, signifying that people are less likely to flow just in one direction. Given the increase in globalisation and cyclical migration over this period, along with the overall increase in flows, this is probably what we would expect.

7.2.4 Generating a Consistent Time Series of Migration Probabilities

The next stage in the estimation process is to use the estimates of c07-math-0045 to generate sets of bilateral flow estimates, c07-math-0046, using the corresponding stock probabilities, c07-math-0047. To do this for the years where data exist is very straightforward, although some added complications for time periods emerge. As previously outlined, UN (World Bank) bilateral migration stock data (and therefore probabilities) exist for six single years, on the decade, from 1960 to 2010. These single years are represented with the superscript index, c07-math-0048. Total migrant data estimated from either net migration balances or Abel's (2013b) data set exist for 5-year periods, c07-math-0049, where c07-math-0050 and

7.14 equation

which means that as it stands:

7.15 equation

Therefore, in order to proceed, we need to harmonise the time periods for either the stock probabilities or the total migration estimates, so that we arrive at either the estimate shown in Equation (7.6) or an alternative:

7.16 equation

Either way, we are presented with two options: estimate c07-math-0054 from c07-math-0055 or estimate c07-math-0056 from c07-math-0057 – both of which have options which will influence the final flow estimates. We will begin with the estimation of c07-math-0058 from c07-math-0059 (consider Figure 7.3). Figure 7.3 shows the data points for the total flow estimates, c07-math-0060, from Abel's data, where c07-math-0061. Our first task is to decide whether c07-math-0062 is the true value of c07-math-0063 or just an approximation. In Figure 7.3, we can observe from the plotted points that while in general the value of c07-math-0064 increases over time, there is noticeable fluctuation in the values. These might be real fluctuations, but perhaps more probably they correspond to errors in data collection. If we believe that the trend is more probable than the observed data, then new values of c07-math-0065 could be calculated. The most obvious way to do this, as shown by the straight purple line, is to fit a regression line through the points, minimising the sum of the squared differences between the line and the residual data points. Alternatively, we can retain some of the fluctuation in the original values of c07-math-0066 by using some kind of moving average. The pink and blue lines in Figure 7.3 demonstrate a two-case and three-case moving average, respectively. Clearly, the choice of starting values for c07-math-0067 will influence any subsequent calculation of c07-math-0068.

nfgz003

Figure 7.3 Smoothing options for total migrant estimation

Once we have settled on a set of values for c07-math-0069, then assuming a linear trend between any of the values of c07-math-0070, we are able to estimate c07-math-0071. Perhaps the most straightforward (although crude) option is to assume that the regression line is the best estimate of all values of c07-math-0072. When c07-math-0073 is the sum of all flows during a 5-year period, then

7.17 equation

Using the values shown in Figure 7.3 (the purple linear regression line), for c07-math-0075, c07-math-0076. For c07-math-0077. As we are assuming the same linear relationship between all values of c07-math-0078, then by simply calculating the slope and intercept values, we are able to estimate total migrants for any year in the range. As c07-math-0079 is the arithmetic mean of the 5 previous years summing to c07-math-0080, then where c07-math-0081.

7.2.5 Producing Annual Bilateral Estimates

We now have a series of total flow estimates for all years between 1965 and 2010 and bilateral flow probabilities, c07-math-0082, for 6 years, c07-math-0083: 1960, 1970, 1980, 1990, 2000 and 2010. Using Equation (7.6), it is straightforward to apply these probabilities to estimate bilateral flows for each of these 6 years. Interpolating the probabilities for intervening years between each known decade means that a full set of estimates can be produced for all intervening years.

7.3 Results and Validation

7.3.1 Introduction

The aim of this estimation exercise was to generate a set of global bilateral flow estimates with values more accurate than others presently available. Having generated a new time series of estimates, we are now able to assess their validity (as far as is possible). In addition to the IMEM data for Europe mentioned earlier, some empirical flow data exist for a small set of global countries and are collected by the United Nations.7 Kim and Cohen 2010 collate some of these data for analysis in their paper and it is these data we make use of here. Kim and Cohen's outflow data are recorded flows from 13 developed world countries (Australia, Belgium, Croatia, Denmark, Finland, Germany, Hungary, Iceland, Italy, New Zealand, Norway, Sweden and the United Kingdom) between 1950 and 2007. Their inflow data are to these same countries, plus Canada, France, Spain and the United States over the same time period (although for both inflows and outflows, not all years are available for every origin/destination pair).

7.3.2 IMEM comparison

As noted earlier, the comparison with IMEM data is not ideal as we are having to divide Abel's estimates by 5 to get an approximation of the 1-year migration estimates. If the comparisons are close, then this will be more of a problem than if there are significant differences between the estimates. Table 7.3 presents three key statistics for the years 2002 and 2007: the total estimated migrants for the 841 interactions; the coefficient of determination (R2) for all flows compared with the equivalent IMEM data – which gives an indication of the overall correlation; and the root mean squared error – which gives an indication of the accuracy of the estimates.

Table 7.3 Goodness-of-fit statistics for new estimates and Abel's estimates when compared with flows in Europe in 2002 and 2008 from the IMEM project

Year Goodness of fit IMEM New estimates Abel estimates
2002 Total 1,246,762 673,056 407,792
R2 0.44 0.24
R2 log(data) 0.77 0.40
RMSE 3679 4533
2007 Total 1,849,528 676,791 419,727
R2 0.51 0.15
R2 log(data) 0.75 0.40
RMSE 5923 7557

Immediately apparent is that both estimation methodologies underestimate flows in both years quite considerably, but the new methodology places more migrants in the system and gets a little closer to the IMEM estimate. In terms of the structure of the migration system, the new estimates perform much better, with R2 values of 0.44 and 0.51 – significantly better than the 0.24 and 0.15 shown in Abel's data – normalising the skewed distribution with a log10 transformation does not change this relationship, although does improve the R2 for both sets of estimates. This is perhaps not surprising, given the aggregate structure of the migration systems implicit in the flow probabilities used to generate these estimates. Finally, the root mean squared error shows that the new estimates are also more accurate than Abel's with much lower error.

Overall, it is clear that the new method produces better results for Europe, but exploring the data, it is interesting to see where the errors are being made with both estimates – doing so confirms the strengths and weaknesses of both methods. For example, in 2002, the IMEM estimate for flows from Ireland into Britain is 15,230, whereas the Abel estimate is 56 and the new estimate some 27,549. In terms of the magnitude of the error, both are a long way away, but the overestimate of the new method is indicative of the historical stocks of Irish migrants in Britain. Abel's huge underestimate exposes the problems in dealing with foreign born/versus foreign national stock definitions and the complex patterns of national allegiances among people living in Northern Ireland and the Irish Republic that are very hard to capture with the data to hand. Other similar examples in the data abound (Poland to Germany migration; Italy to Switzerland) and perhaps point to the importance of historical flows when trying to predict what is more likely at another time.

7.3.3 UN Flow Data Comparison

Comparison with IMEM data suggests that these new estimates are potentially better than other global estimates currently available, but comparison with alternative flow data confirms this. Using the time series UN flow data from the Kim and Cohen (2010) paper and aggregating both the new estimates and these UN flows to 5-year totals, we can compare the flows directly with Abel's 5-year estimates (as opposed to having to disaggregate Abel's data into single years in the European validation exercise). Figure 7.4 shows the initial results of these comparisons for eight 5-year periods between 1970 and 2005. The y-axis on both sets of graphs features the UN flow data, aggregated from the 5 preceding years and displayed on a log scale. Figure 7.4a plots these UN data against the Abel estimates and Figure 7.4b plots the new estimates (again, both on a log-scale). Table 7.4 provides accompanying descriptive statistics for the relationships between the data.

nfgz004

Figure 7.4 Flow data estimate comparison with UN flow data, 1970–2005

Table 7.4 Descriptive statistics for flow data estimate comparison with UN flow data, 1970–2005

5-year period end n Total migrants RMSE R2 (log10)
UN New Abel New Abel New Abel
estimates estimates estimates estimates estimates estimates
1970 399 6,692,562 2,027,441 4,386,639 58,248 47,103 0.77 0.47
1975 586 6,594,886 2,650,419 5,318,863 45,598 43,309 0.68 0.27
1980 1044 6,461,809 3,339,735 6,308,161 24,662 31,769 0.75 0.46
1985 1214 6,522,112 4,359,780 6,763,096 14,513 48,399 0.77 0.47
1990 1364 8,777,383 5,064,034 8,137,808 22,606 41,159 0.79 0.47
1995 1902 10,793,979 7,033,409 12,187,323 18,489 28,914 0.80 0.50
2000 1963 11,584,562 9,270,876 13,712,391 18,384 41,912 0.84 0.57
2005 1908 14,620,087 9,912,514 12,303,871 22,646 26,172 0.84 0.53

Studying Figure 7.4, it is clear for all 5-year periods that the new estimates on Figure 7.4b are generally closer to the UN estimates than Abel's estimates on Figure 7.4a – in all graphs, points are clustered far more closely to the 45° line. In every time period with Abel's estimates, larger flows (towards the right of the graph) are estimated more accurately than smaller flows, whereas with the new estimates, the patterns of over- and underestimation remain relatively stable.

Studying the goodness-of-fit statistics in Table 7.4, a number of points can be made. Firstly, it should be noted that the closer to 1970, the fewer data points are available for comparison – in each case, n relates to the number of origin/destination pairs that are commonly found in each data set – these pairs will be combinations of all possible origin countries and the 17 inflow countries mentioned earlier (inflow data being more reliable than outflow data – although similar patterns are apparent with outflow data as well).

With the new estimates, R2 values are significantly higher for every 5-year period, although this is also apparent from the graphs. On the whole, the new estimates underestimate the total flows in this section of the system, whereas the Abel estimates get a little closer. However, between 1980 and 2005, the new estimates have a much lower root mean squared error, consolidating the better R2 values present. Only for the 5-year periods ending in 1975 and 1970 are the RMSE statistics for Abel's data better.

Overall, the impression is that for a whole range of origin destination combinations over quite a lengthy time period, the new estimation methodology described above outperforms the methodology established by Abel. This evidence from empirical UN flow data, taken along with the evidence from the European IMEM estimates, points fairly conclusively to the new methodology, while very simple, outperforming any other flow estimation methodology.

7.4 Discussion

This new methodology is able to generate a long time series of annual migration flows from historical bilateral stock data and estimates of the total migrants moving around the globe in each year. The method is exceedingly simple, but is able to produce better estimates of flows than data that are currently available elsewhere in the public domain.

Because they are better, we should not make the mistake of thinking they are actually very good. With this kind of estimation where we are looking to maximise the plausibility of the estimates across as many origin/destination combinations as possible, then it is inevitable that there will be residuals which are some way from the truth. Rather than dwelling on these individuals, it is perhaps more useful to think about where errors are most likely to occur. Table 7.5 shows the time series of estimated and observed flows between Italy and Australia between 1960 and 2006. The Italians form one of the largest immigrant groups in Australia, but most of this migration has been since World War II (Cresciani, 2003). The observed flows in the data we have record a large numbersof migrants at the beginning of the period, but drop off quite sharply at around 1970. This is a peculiar, but not uncommon in migration histories between countries (e.g. Jamaicans to the United Kingdom) where particular political or economic conditions lead to sudden peaks of migration activity which perhaps do not represent the longer-term picture – this is what Massey et al. 1999 were referring to when commenting that stocks are not necessarily representative of flows. As is apparent in Table 7.5, the result is that we have quite large overestimation of flows the farther away from the real migration peak we get. These distortions are particularly acute as the migrants who moved over from Italy to Australia up until 1970 are likely to be still alive and thus boosting the stock. Adjustments could be made for these sorts of trends using information on when these bursts of migration activity occur in parallel with life tables, but we would be rapidly moving away from a quick and generalisable migration estimation model to a far more complex beast.

Table 7.5 Estimates versus recorded flows, Italy to Australia, 1960–2006

Year New UN data Year New UN data
estimate estimate
1960 3,841 18,360 1984 6,810 562
1961 4,058 16,795 1985 6,835 676
1962 4,284 13,427 1986 6,855 589
1963 4,520 12,903 1987 6,871 530
1964 4,768 10,309 1988 6,881 421
1965 5,030 11,420 1989 6,883 306
1966 5,308 12,888 1990 6,949 353
1967 5,600 15,042 1991 6,950 285
1968 5,906 13,175 1992 6,943 283
1969 6,226 10,224 1993 6,925 286
1970 6,558 7,573 1994 6,898 336
1971 6,605 5,874 1995 6,861 304
1972 6,646 3,426 1996 6,818 272
1973 6,679 2,931 1997 6,766 201
1974 6,704 2,389 1998 6,704 197
1975 6,720 1,365 1999 6,633 168
1976 6,727 1,318 2000 6,554 181
1977 6,725 1,598 2001 6,603 267
1978 6,714 1,282 2002 6,649 139
1979 6,697 1,044 2003 6,692 186
1980 6,672 1,696 2004 6,732 195
1981 6,712 1,381 2005 6,769 187
1982 6,748 553 2006 6,807 204
1983 6,781 492

Further problems with the method as described earlier are the assumed linear relationships and subsequent linear interpolations which have been carried out in order to generate the estimates. For ease of implementation, it has been assumed that a number of migrants active in the world in a given year are linearly related to previous years, although given the superlinear growth in the global population thus far and the projected logistic pattern of the global future population, this may not be wise. The relatively short timescale dealt with here means that population growth looks somewhat linear, but while migration growth will be related in some way to population growth, other factors such as ease of global travel and increasing globalisation mean that a superlinear growth may be more appropriate.

Another issue with this method is that it is obviously deterministic, to the extent that the R2 values between the stock probabilities and any observed flows will be exactly the same as the R2 values between any derived migration estimates and any observed flows. This is not ideal. It would be preferable and more realistic to include some random fluctuation in the flows, although the exact method for achieving this would need to be researched. While it would not improve the deterministic nature of the results, incorporating the bilateral stock data into a spatial interaction model with more terms would increase the randomness of the estimates a little and would certainly make the data feel a little less synthetic.

7.5 Conclusions

A new method for estimating a series of global migration flow tables has been outlined here. Through using historical bilateral migrant stock data as flow probabilities, it has been shown that a very simple proportional distribution model can achieve superior migration flow estimates to the only other comparable set of flows currently available: those created by Abel (2013b) following the work of (Abel, 2013a). The new methodology is not perfect – it is shown to underestimate the total flows for the systems where we have comparable data and as stocks reflect historical flows, there are also problems with over- and underestimation between particular country combinations.

This new method should be viewed more as a step in the right direction for migration flow estimation rather than a definitive solution to the problem of knowing which countries people are likely to be migrating between in any given year. As mentioned, the single probability term in the equation means that the estimated flows are directly tied to it, with no variation at all. Including additional predictor terms in the equation such as those used in a number of explanatory migration models (Cohen et al., 2008; Abel, 2010; Fotheringham et al., 2004) would produce more variation and potentially improve the estimates still further. If available, however, historical stocks should definitely be used in the estimation process.

References

  1. Abel, G.J. (2010) Estimation of international migration flow tables in Europe. Journal of the Royal Statistical Society: Series A (Statistics in Society), 173 (4), 797–825.
  2. Abel, G.J. (2013a) Estimating global migration flow tables using place of birth data. Demographic Research, 28 (18), 505–546.
  3. Abel, G.J. (2013b) Estimates of global bilateral migration flows by gender between 1960 and 2010. International Migration Review. http://www.oeaw.ac.at/vid/download/WP2015_05.pdf.
  4. Cohen, J., Roig, M., Reuman, D. and GoGwilt, C. (2008) International migration beyond gravity: A statistical model for use in population projections. Proceedings of the National Academy of Sciences, 105 (40), 15269–15274.
  5. Cresciani, G. (2003) The Italians in Australia, Cambridge University Press.
  6. de Beer, J., Raymer, J., van der Erf, R. and van Wissen, L. (2010) Overcoming the problems of inconsistent international migration data: A new method applied to flows in Europe. European Journal of Population/Revue européenne de Démographie, 26 (4), 459–481.
  7. Flowerdew, R. and Lovett, A. (1988) Fitting constrained Poisson regression models to interurban migration flows. Geographical Analysis, 20 (4), 297–307.
  8. Fotheringham, A.S., Rees, P., Champion, T. et al. (2004) The development of a migration model for England and Wales: Overview and modelling out-migration. Environment and Planning A, 36 (9), 1633–1672.
  9. Kim, K. and Cohen, J.E. (2010) Determinants of international migration flows to and from industrialized countries: A panel data approach beyond gravity. International Migration Review, 44 (4), 899–932.
  10. Massey, D.S., Arango, J., Hugo, G. et al. (1999) Worlds in Motion: Understanding International Migration at the End of the Millennium, Oxford University Press, Oxford.
  11. Özden, C., Parsons, C.R., Schiff, M. and Walmsley, T.L. (2011) Where on earth is everybody? World Bank Economic Review, 25 (1), 12–56.
  12. Raymer, J. and Abel, G. (2008a) Methods to Improve Estimates of Migration Flows - the MIMOSA Model for Estimating International Migration Flows in the European Union. Geneva: UNECE/Eurostat work session on migration statistics - Working Paper, 8.
  13. Raymer, J. and Abel, G. (2008b) The MIMOSA Model for Estimating International Migration Flows in the European Union. In Joint UNECE/Eurostat Work Session on Migration Statistics. Geneva: UNECE/Eurostat.
  14. Raymer, J., Wiśniowski, A., Forster, J.J. et al. (2013) Integrated modeling of European migration. Journal of the American Statistical Association, 108 (503), 801–819.
  15. Stillwell, J. (1978) Interzonal migration: Some historical tests of spatial-interaction models. Environment and Planning A, 10, 1187–1200.
  16. UKSA (2009) Migration Statistics: The Way Ahead, UK Statistics Authority, London.
  17. UN (2013) World Population Prospects: 2012 Revision - Metadata, ed, U. N. P, Division.
  18. Willekens, F. (1999) Modeling approaches to the indirect estimation of migration flows: from entropy to EM. Mathematical Population Studies, 7 (3), 239–278.
  19. Wilson, A. (1971) A family of spatial interaction models, and associated developments. Environment and Planning A, 3, 1–32.
  20. Wiśniowski, A., Bijak, J., Christiansen, S. et al. (2013) Utilising expert opinion to improve the measurement of international migration in Europe. Journal of Official Statistics, 29, 583.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.220.191.247