Making data homogeneous

What relates the representation of data to real-life objects is the simple combination of geometry with the properties of a feature.

A line for example, can be a road, river, fence, and so on. The only difference may be the type property that tells us what it is. Alternatively, we may have a file named roads that lets us know that it contains roads.

However, the computer doesn't know about this as it doesn't know what the other properties represent or what the file is. Because of this, we need to make transformations in the data in order to have a common format that can be analyzed.

This common format is the subject of this topic; it is how data can be represented in Python in an optimal way and in which the objects can be manipulated and analyzed to produce the expected results.

The objective is to transform the basic data representation of features, geometries, and properties into a representation of real-life objects and hide the details of the functionality under the hood in this process. In computer science, this is called abstraction.

Instead of just writing some prepared code and magically performing the transformation, we will go step by step through the process of deduction of how the transformation needs to be done. This is very important because it's the foundation of developing code to perform any kind of transformation on any type of geographic data that you can put to use in the future.

The concept of abstraction

Now that we have a clear understanding of how data is represented, let's get back to our geocaching application.

Abstraction is a programming technique intended to reduce the complexity of code for the programmer. It's done by encapsulating complex code under progressive layers of more human-friendly solutions. The lower the level of abstraction, the closer to the machine language and the harder to maintain it is. The higher the level of abstraction, the more the code tries to mimic the behavior of real things or the more it resembles a natural language, thus becoming more intuitive and easier to maintain and extend.

Going back to the examples that we saw so far, we may notice many levels of abstraction—for example, when we use the OGR library in the function we use to open shapefiles. Take a look at the following code:

def open_vector_file(file_path):
    """Opens an vector file compatible with OGR, get the first layer
    and returns the ogr datasource.

    :param str file_path: The full path to the file.
    :return: The ogr datasource.
    """
    datasource = ogr.Open(file_path)
    layer = datasource.GetLayerByIndex(0)
    print("Opening {}".format(file_path))
    print("Number of features: {}".format(layer.GetFeatureCount()))
    return datasource

Just at the uppermost layers of abstraction, we have the function itself that hides the functionality of OGR. Then, we have the OGR Python bindings that abstract the OGR C API, which in turn handles memory allocation, all the mathematics, and so on.

Abstracting the geocache point

So, we need to handle multiple sources of data in a smart way so that:

  • We don't need to change the code for each type of data
  • It's possible to combine data from multiple sources
  • If we add extra functionality to our program, we don't need to worry about file formats and data types

How will we do this? The answer is simple: we will abstract our data and hide the process of format and type handling in the internal functionality.

The objective is that after this point in the app, we won't need to deal with OGR, layers, features, and so on. We will have one and only one type of object that we will use to represent our data, and all the interaction will be done with this object. The geocache object will represent a single geocaching point with the properties and methods that can be used to manipulate this object.

Abstracting the geocache point

Now, perform the following steps:

  1. First, let's organize the project structure. Open your geopy project in PyCharm and create a directory named Chapter3.
  2. Copy all the files and directories from Chapter2 to Chapter3. You should end up with a structure similar to the following:
    +---Chapter3
    |   |   geocaching_app.py
    |   |   __init__.py
    |   |   
    |   +---experiments
    |   |       import_test.py
    |   |       module_test.py
    |   |       
    |   ---utils
    |           data_transfer.py
    |           geo_functions.py
    |           __init__.py
  3. Inside Chapter3, create a new file named models.py (from this point on, we will work inside the Chapter3 directory).
  4. Now, add this code to the file:
    class Geocache(object):
        """This class represents a single geocaching point."""
        
        def __init__(self, x, y):
            self.x = x
            self.y = y
        
        @property
        def coordinates(self):
            return self.x, self.y
  5. Now, we have a geocache class with its first properties: the coordinates for the geocache. To test our class, we can write the following code:
    if __name__ == '__main__':
        one_geocaching_point = Geocache(20, 40)
        print(one_geocaching_point.coordinates)
  6. Run your code, press Alt + Shift + F10, and select the models files. You should get this output in the console:
    (20, 40)
    
    Process finished with exit code 0

Abstracting geocaching data

As we have a single point, we also need to have a collection of points. We will call this PointCollection. Continuing the process of abstraction, the objective is to hide the operations of importing and converting the data. We will do this by creating a new class and encapsulating some of our utility functions inside it. Go to your models.py file and add the following class:

class PointCollection(object):
    def __init__(self):
        """This class represents a group of vector data."""
        Self.data = []

It's a simple class definition, and in the __init__ method, we will define that each instance of this class will have a data property. Now that we have created our simple abstractions, let's add functionality to it.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.211.106