The Earth is not a ball

The world is not a perfect sphere. It is an ellipsoid and like a middle-aged man, it bulges in the center. This means that using simple geometry calculations for a sphere on geospatial data will not be entirely accurate.

Using methods for calculating distance across a sphere, such as the haversine formula for great circle distance calculation, will become more and more inaccurate as the distance between two points increases. You can still use the haversine method for calculating short distances without much loss in accuracy, but avoid it for longer distances.

The haversine formula. Source: Wikipedia.org

The following R code can be used to calculate the haversine distance for two points on the Earth. Remember that it is not entirely accurate over long distances. Instead, leverage code packages and database systems that are aware of the CRS, and are able to use it to calculate spatial data with precision. The R code uses two points that are far apart to give you an idea of the relative inaccuracy of the method. The actual distance between the points is 12,935 km:

#code adapted from RosettaCode.

#Coordinates for the two points
#Chicago, USA O'Hare airport (ORD)
Point1Lat = 41.978194
Point1Long = -87.907739

#Coordinates for Chhatrapati Shivaji International Airport near Mumbai, India airport (BOM)
Point2Lat = 19.0895595
Point2Long = 72.8656144

#convert decimal degrees to radians
degrees_to_rad <- function(deg) (deg * pi / 180)

# Volumetric mean radius is 6371 km for the Earth, see http://nssdc.gsfc.nasa.gov/planetary/factsheet/earthfact.html
# The diameter is thus 12742 km

#function to calculate great circle distance using haversine method
great_circle_distance <- function(lat1, long1, lat2, long2) {
a <- sin(0.5 * (lat2 - lat1))
b <- sin(0.5 * (long2 - long1))
12742 * asin(sqrt(a * a + cos(lat1) * cos(lat2) * b * b))
}

#calculate distance for the two points
haversine_distance <- great_circle_distance(
degrees_to_rad(Point1Lat), degrees_to_rad(Point1Long), # Nashville International Airport (BNA)
degrees_to_rad(Point2Lat), degrees_to_rad(Point2Long)) # Los Angeles International Airport (LAX)

#result shown in kilometers
haversine_distance
# 12,942.77km

There are different methods to adjust for the actual shape of the Earth. WGS 84 CRS uses a few parameters in order to increase accuracy of the projections. The following table summarizes the parameters:

WGS 84 defining parameters. Source: United Nations Office for Outer Space Affairs

Although you will mostly come across two dimensional coordinates for the WGS 84 CRS, it is a three dimensional representation of the Earth. The starting point is based on the Earth's center of mass. The following image from the Defense Mapping Agency shows how this is represented:

WGS Reference Frame By Defense Mapping Agency - Section 1-5 PDF of the DMA TECHNICAL REPORT TR8350.2-b - (Second Printing, 1 December 1987) Supplement to DoD WGS 84 Technical Report Part 2 - Parameters, Formulas, and Graphics. http://earth-info.nga.mil/GandG/publications/tr8350.2/TR8350.2-b/Sections%201-5.pdf, Public Domain, https://commons.wikimedia.org/w/index.php?curid=41796676

The point of highlighting this detail is to not only give an idea of the complexity behind map projections but to convince you to leverage the centuries of work already completed in this field. Do not attempt to duplicate it; take advantage of what has already been done.

A great way to take advantage of what has already been done in geospatial analytics is by using Python geospatial libraries. There are many great packages for R that can be used as well, but there are more options with Python. They tend to be more mature due to the long use of Python in the geospatial community.

Python also tends to scale better than R, so it can be a better fit for large-scale, compute-intensive processing. Geospatial calculations can get fairly intensive and IoT data, as we know well by this point in the book, becomes a large-scale effort in a short amount of time.

Due to these considerations, Python for geospatial analytics is a great fit. In this chapter, we will shift our focus from R code to Python code. For big data analytics, you should be comfortable with both. An easy way to get started with Python is to download the Anaconda package from https://www.continuum.io/downloads. Anaconda includes Python, R, and over 720 packages, including more than 100 of the most popular packages for data science.

It also includes both a Python IDE called Spyder and the browser-based notebook, Jupyter. Jupyter is important for IoT analytics as it can be run on a Hadoop cluster, allowing you to develop Python code in a distributed environment without having to run code through a console. It also can be used to develop Spark applications in Python, Scala, or R. The combination of Spark and Python geospatial libraries, installed across the cluster, provides the capability to scale geospatial analytics.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.40.43