A reliable and efficient data repository is the heart of a distributed system. If this data repository is created for analytics, then it is also called a data lake. A data repository brings together data from different domains into a single location. Let's start with first understanding different issues related to the storage of data in a distributed repository.