Data refineries

Well functioning data lakes are not actually uniform pools of data water. They are more like a system of petroleum refineries. In a petroleum refinery system, the raw West Texas Intermediate crude flows in by pipeline or supertanker into several different refineries. The refineries then process it into several different refined petroleum-based products, from gasoline to petroleum jelly and even animal feed.

Sometimes, the results of one refinery are an input into another refinery, which further processes and combines it with other refined products. Not to take the analogy too far, but nobody takes the crude oil out of the ground and puts it in their gas tank.

A very similar process happens with data refineries. Nobody uses the raw crude data; they use the high-value processed and finished products. The following diagram demonstrates the concept:

Data refineries operating inside your data lake
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.190.93