Geospatial analysis in the big data world

Volume and velocity pose some challenges for geospatial analytics. The data size can easily be too large to analyze with a desktop GIS tool. It could even be too large to handle effectively in a relational database with spatial extensions. Due to the intensive computational requirements of geospatial functions, near real-time response can also be a challenge.

There are some options for geospatial analysis with tools built specifically with big data in mind. Elasticsearch is an open source distributed search engine. It can scale from one server to hundreds of servers, and it has some spatial search functions. You can search for locations within a certain distance of a latitude and longitude point, for example. AWS offers a managed Elasticsearch service where there is no need to worry about managing servers.

AWS also has a managed petabyte-scale data warehouse service called Redshift. This was introduced in Chapter 3, IoT Analytics for the Cloud. Redshift does not support geometry fields directly but does support Python UDFs. You can create UDFs using Python code and the shapely package, then call them from Redshift SQL statements. A similar strategy can be used for both Hive and Spark.

ESRI supports an open source project called GP tools for AWS that allows ArcGIS users to connect to Amazon EMR and S3 data sources. The project is hosted on GitHub (https://github.com/Esri/gptools-for-aws).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.143.181