What relates the representation of data to real-life objects is the simple combination of geometry with the properties of a feature.
A line for example, can be a road, river, fence, and so on. The only difference may be the type
property that tells us what it is. Alternatively, we may have a file named roads
that lets us know that it contains roads.
However, the computer doesn't know about this as it doesn't know what the other properties represent or what the file is. Because of this, we need to make transformations in the data in order to have a common format that can be analyzed.
This common format is the subject of this topic; it is how data can be represented in Python in an optimal way and in which the objects can be manipulated and analyzed to produce the expected results.
The objective is to transform the basic data representation of features, geometries, and properties into a representation of real-life objects and hide the details of the functionality under the hood in this process. In computer science, this is called abstraction.
Instead of just writing some prepared code and magically performing the transformation, we will go step by step through the process of deduction of how the transformation needs to be done. This is very important because it's the foundation of developing code to perform any kind of transformation on any type of geographic data that you can put to use in the future.
Now that we have a clear understanding of how data is represented, let's get back to our geocaching application.
Abstraction is a programming technique intended to reduce the complexity of code for the programmer. It's done by encapsulating complex code under progressive layers of more human-friendly solutions. The lower the level of abstraction, the closer to the machine language and the harder to maintain it is. The higher the level of abstraction, the more the code tries to mimic the behavior of real things or the more it resembles a natural language, thus becoming more intuitive and easier to maintain and extend.
Going back to the examples that we saw so far, we may notice many levels of abstraction—for example, when we use the OGR library in the function we use to open shapefiles. Take a look at the following code:
def open_vector_file(file_path): """Opens an vector file compatible with OGR, get the first layer and returns the ogr datasource. :param str file_path: The full path to the file. :return: The ogr datasource. """ datasource = ogr.Open(file_path) layer = datasource.GetLayerByIndex(0) print("Opening {}".format(file_path)) print("Number of features: {}".format(layer.GetFeatureCount())) return datasource
Just at the uppermost layers of abstraction, we have the function itself that hides the functionality of OGR. Then, we have the OGR Python bindings that abstract the OGR C API, which in turn handles memory allocation, all the mathematics, and so on.
So, we need to handle multiple sources of data in a smart way so that:
How will we do this? The answer is simple: we will abstract our data and hide the process of format and type handling in the internal functionality.
The objective is that after this point in the app, we won't need to deal with OGR, layers, features, and so on. We will have one and only one type of object that we will use to represent our data, and all the interaction will be done with this object. The geocache object will represent a single geocaching point with the properties and methods that can be used to manipulate this object.
Now, perform the following steps:
Chapter3
.Chapter2
to Chapter3
. You should end up with a structure similar to the following:+---Chapter3 | | geocaching_app.py | | __init__.py | | | +---experiments | | import_test.py | | module_test.py | | | ---utils | data_transfer.py | geo_functions.py | __init__.py
Chapter3
, create a new file named models.py
(from this point on, we will work inside the Chapter3
directory).class Geocache(object): """This class represents a single geocaching point.""" def __init__(self, x, y): self.x = x self.y = y @property def coordinates(self): return self.x, self.y
if __name__ == '__main__': one_geocaching_point = Geocache(20, 40) print(one_geocaching_point.coordinates)
(20, 40) Process finished with exit code 0
As we have a single point, we also need to have a collection of points. We will call this PointCollection
. Continuing the process of abstraction, the objective is to hide the operations of importing and converting the data. We will do this by creating a new class and encapsulating some of our utility functions inside it. Go to your models.py
file and add the following class:
class PointCollection(object): def __init__(self): """This class represents a group of vector data.""" Self.data = []
It's a simple class definition, and in the __init__
method, we will define that each instance of this class will have a data
property. Now that we have created our simple abstractions, let's add functionality to it.
18.188.211.106