Swamps are formed when water flows into an area where it collects and stagnates. Algae covers over the water. When a data lake has a mass of raw data flowing in but no organization and little usage of it to mix up the waters, it becomes what is facetiously referred to as a data swamp.
This often happens when the decision is made to copy data from many systems into one area, such as HDFS, without any changes to it. Analysts can find it difficult to access due to security restrictions. Even when they do have access it, they find it difficult to make sense of the plethora of (typically) relational tables, with no instruction key on what the ID field codes mean or what business logic should be used when working with the data.
They often give up in frustration. In such a case, a huge amount of potentially valuable data festers, unused, eating up storage space and money.