When data lakes turn into data swamps

Swamps are formed when water flows into an area where it collects and stagnates. Algae covers over the water. When a data lake has a mass of raw data flowing in but no organization and little usage of it to mix up the waters, it becomes what is facetiously referred to as a data swamp.

This often happens when the decision is made to copy data from many systems into one area, such as HDFS, without any changes to it. Analysts can find it difficult to access due to security restrictions. Even when they do have access it, they find it difficult to make sense of the plethora of (typically) relational tables, with no instruction key on what the ID field codes mean or what business logic should be used when working with the data.

They often give up in frustration. In such a case, a huge amount of potentially valuable data festers, unused, eating up storage space and money.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.