CHAPTER 9

Data Glossary

Buzzwords

Evolving Concepts

Data is an evolving area, and hence several new concepts are constantly introduced. We have covered several such concepts in this book. Following are some additional data concepts for handy reference. Knowing the difference between them can help you better match your needs with what’s available. This book focused on explaining the data concepts you need to work efficiently with data teams (what is and how to). This chapter will provide a brief introduction to a few underlying data architectures terminologies (where is my data stored?).

Our data ecosystem is continuously undergoing several transformations resulting in changing and introducing new concepts as a result of technology innovation, evolving best practices, and enormous market hype. It can be a struggle to keep new terms straight and to separate real value from hype. Many times, it is not clear if new concepts can replace existing methods or if they simply complement existing technologies to achieve better results. Although most enterprise organizations can handle the majority of their use cases using data warehouse and data lake instead of overthinking data mesh and data fabric concepts (discussed later), these new concepts do offer some benefits. But no organization can implement every new technology, and in order to offer a data experience free of disruption, there needs to be an evaluation to demonstrate that the benefits of a new concept outweigh the effort required to implement it.

Data Warehouse

Think of data warehouse as a home for structured data. This is one of the oldest concepts related to the storing of data. It is a central repository for structured data from various disparate sources. Data warehouse is optimized mainly for business intelligence such as reporting and data analysis. Popular techniques for building data warehouse are ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform). In short, data warehouse provides historical data for analytical processing and data mining tools.105

Data Hub

Data hub is a collection of data from multiple sources and is the go-to place for data within an enterprise. It is an architectural concept that facilitates data sharing across an organization, thereby creating a collaborative data analytics structure.106 Data hub provides the data agility needed by today’s businesses by promoting data sharing and governance. Data hub complements data warehouse and data lakes and is not interchangeable with them—these concepts can be implemented together to create a fully data-enabled organization.

Data Lake

Data lake is a data repository for heterogenous data storage at scale, and for any data structure (structured, semi structured, or unstructured data).107 They became popular concept with the rise of Hadoop, which offered storage at low costs. Over the years, data lake has transformed into a feature of cloud technologies like Amazon S3, and similar tools like Spark and Kafka are catching up as streaming data alternatives. Poorly handled data lake, however, can quickly transform to a data swamp that is neither usable nor trustworthy.

Data Mesh and Data Fabric

If you are hearing these terms for the first time, you are not alone; these are relatively new terminologies. Data fabric and data mesh take data sharing to the next level, though there are differences in their approaches. These emerging concepts are new and fair to state that are not yet fully understood and are still evolving. Data fabric is a unified, technology- and metadata (data about data) centric architecture that connects data between business entities. It aims to bring together all company data across technologies and platforms, and make it available through API. Data mesh, on the other hand, is a decentralized organizational architecture that treats each data set as a domain or product, while the ownership of data resides within the business domain.108

Robotic Process Automation (RPA)

Robotic Process Automation (RPA) is the technology to automate repetitive, rules-based, labor-intensive tasks.109 RPA process involves the creation of software bots that can interact with any application or system the same way a human would. This is achieved by creating a list of actions and then making the bots to perform them. Some common use cases for RPA are completing data entry on forms, reading e-mails and taking some action, performing customer service to handle initial responses, and processing HR or employee information. The best part of an RPA solution is that it doesn’t require full-scale implementation from the start. You can start small by implementing one task, testing it, and then deciding if that solution makes sense for your organization. There is a common misconception and confusion that RPA and Artificial Intelligence are the same. However, RPA is about copying human actions in a logical way, while AI is about gaining the intelligence of human-like learning and thinking. RPA and AI are separate technologies with slightly different focus areas, but they can be implemented together. UiPath, a robotic process automation software company, is bringing AI to RPA together to create intelligent automation.110

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.210.104