Core architecture principles of Data Lake

We did cover some of the core principles that we have followed when we were actually implementing the Data Lake. But, explicitly we haven't mentioned these because bringing these points upfront can be a daunting and might not enlighten your brain as you are just stepping into a Data Lake implementation. Since you now have a base Data Lake working, it's good time to bring these core principles together and we feel these has to be always remembered when bringing in new capabilities and technologies into your Data Lake ecosystem. This again in no way authoritative, rather, it's just some guiding principles that we thought quite useful.

  • Accept any data in raw format (immutable data) into the Data Lake. All data in an enterprise has value attached to it. Don't try getting the value in the first go, rather just ingest and try deriving its value going forward.
  • During time of data ingestion don't look for value out of the data getting ingested.
  • Be ready to accept any type of data (structured and unstructured).
  • Be ready to accept any quantity of data.
  • Don't restrain data storage, the way by that you can query the data from Data Lake. Bring in varied technologies according to requirement, for various analysis.
  • Give easy way for enterprise applications to ingest data. Initially these data could not make much sense but over period of time, these data could be collaborated with other data elements in Data Lake and could result in value propositions for enterprise.
  • Don't worry about data normalization while storing.
  • Adding data source should be quick, easy and cheap (highly scalable).
  • Should be able to serve Enterprise data in various formats as required by consuming applications.
  • Should help in supporting required data intelligence requirements with data aggregations and processing at scale.
  • Should be able to de-dup and cleanse the data, either in motion or at rest.
  • Should be able to support various security mechanisms for inflight as well as data at rest.
  • Must be highly available as it serves critical Enterprise data.
  • Don't force the incoming data to change it's format according to your data format, rather accept the data in the form that is required by the incoming data.
  • Try as many ways as possible to reduce the data size and network/bandwidth requirement. Use different methodologies like compression to achieve this.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.3.167