Data management and analytic practices have changed dramatically since I entered the industry in 1998. Data volumes are exploding beyond imagination, easily in the petabytes. There are many varieties of data that we are collecting, both structured and semi-structured data. We are acquiring data at much higher velocity, demanding daily renewal, sometimes even hourly. As the Greek philosopher Heraclitus so wisely stated centuries ago, “The only thing that is constant is change.”
The management of data, and how we handle and analyze it, has changed dramatically since the start of the “big data” era. Ultimately, all of the data must deliver information for decision making. It is definitely an exciting time that creates many challenges but also great opportunities for all of us to explore and adopt new and disruptive technologies to help with data management and analytical needs. And, now, the journey of this book begins.
I have attended a number of conferences where I have been able to share with both business and IT audiences the technologies that can help them more effectively manage their data, in return creating a more streamlined analytical life cycle. I have learned from customers the challenges they encounter and the fascinating things they are doing with agile analytics to drive innovation and gain competitive advantage for their companies. These are the biggest and most common themes:
A good friend of mine, who is an editor, approached me to consider writing a book that combines real-world customer successes based on the concepts they adopted from presentations and white papers that I authored over the years. After a few months of developing the abstracts, outlines, and chapters, we agreed to proceed publishing this book with a focus on customer success stories in each section. My goals for this book are to:
Whether you are from business or IT, I believe you will appreciate the real-world best practices and use cases that you can leverage in your profession. These best practices have been proven to help provide faster data-driven insights and decisions.
Writing this book was a privilege and honor. Mixed feelings went through my head as I started writing the book even though I was excited about sharing my experiences and customer successes with other IT and business professionals. The reasons for the mixed feelings were twofold:
Customer interactions are very important to me and a highlight in my profession. I have talked to many customers globally, tried to understand their business problems, and advised them on the appropriate technologies and solutions to solve their issues. I also have traveled around the world, sharing with customers and prospects the latest technologies and innovation in the market and how some of the leading-edge companies have adopted them to be more competitive and become the pioneers of managing data and applying analytics in a unified environment. Before I dive into the details, I believe it is appropriate to set the tone and definitions to be referenced throughout this book and some trends in the industry that demand inventive technologies to sustain leadership in a competitive, global economy. The topics of this book are focused on data management and analytics and how to unite these two elements into one single entity for optimal performance, economics, and governance—all of which are key initiatives for business and IT in many corporations.
The term data management has been around for a long time and has transformed into many other trendy buzzwords over the years. However, for simplification purposes, I will use the term data management since it is the foundation for this book. I define data management as a process by which data are acquired, integrated, and stored for data users to access. Data management is often associated with the ETL (extraction, transformation, and load) process to prepare the data for the database or warehouse. The ETL process is very much embedded into the data management environment. The ultimate result from the ETL process is to satisfy data users with reliable and timely data for analytics.
There are many definitions for analytics, and the focus on analytics has recently been on the rise. Its popularity has reemerged since the 1990s because many companies across industries have recognized the value of analytics and the field of data analysis to analyze the past, present, and future with data. Analytics can be very broad and has become the catch-all term for a variety of different business initiatives. According to Gartner, analytics is used to describe statistical and mathematical data analysis that clusters, segments, scores, and predicts what scenarios have happened, are happening, or are most likely to happen.1 Analytics have become the link between IT and business to exploit massive mounds of data. Based on my interactions with customers, I define analytics as a process of analyzing large amounts of data to gain knowledge and understanding about your business and deliver data-driven decisions to make business improvements or changes within an organization.
Now that the definitions have been established, let's examine the state of the IT industry and what customers are sharing with me regarding the challenges they encounter in their organizations:
These trends translate into challenges and opportunities for companies in every industry. The customers that I deal with consider these as their top three challenges:
Data is every organization's strategic asset. Data provide information for operational and strategic decisions. Because we are collecting many more types of data (from websites, social media, mobile, sensors, etc.) and the speed at which we collect the data has significantly accelerated, data volumes have grown exponentially. Customers that I have spoken to have doubled their data volumes in less than 24 months, which is beyond what Moore's law (that the rate of change doubles in 24 months) predicted over 50 years ago. With the pace of change escalating faster than ever, customers are looking for the latest innovation in technologies to try and satisfy their needs in both IT and business within a corporation and transform every challenge into big opportunities to positively impact the profitability and bottom line. I truly believe the new and innovative technologies such as in-database processing, in-memory analytics, and the emerging Hadoop technology will help tame the challenges of managing big data, uncover new opportunities with analytics, and deliver a higher return on investment by augmenting data management with integrated analytics.
This book is for business and IT professionals who want to learn about new and innovative technologies and learn what their peers have done to be successful in their line of work. It is for the business analysts who want to be smarter at delivering information to different parts of the organization. It is for the data scientists who want to explore new ways to apply analytics. It is for managers, directors, and executives who want to innovate and leverage analytics to make data-driven decisions impacting profitability and the livelihood of their business.
You should read this book if your profession is in one of these groups:
This book is ideal for professions who want to improve the data management and analytical processes of their organization, explore new capabilities by applying analytics directly to the data, and learn from others how to be innovative and to become pioneers in their organization.
This book can be read in a linear manner, chapter by chapter. It proceeds very much as a process of crawling, walking, sprinting, then running. However, if you are a reader who is already familiar with the concept of in-database processing, in-memory analytics, or Hadoop, you can simply skip to the chapter that is most relevant to your situation. If you are not familiar with any of the topics, I highly suggest starting with Chapter 1, as it highlights the analytical life cycle of the data and data's typical journey to become information and insights for your organization. You can proceed to Chapters 2 to 4 (crawl, walk, sprint) to see how specific technologies can be applied directly to the data. Chapter 5 (how to run the relay) brings all of the elements together and how each technology can help to manage big data and advanced analytics. Chapter 6 discusses the top five focus areas in data management and analytics as well as possible future technologies.
Table 1 provides a description and focus for each chapter.
Table 1 Outline of the Chapters
Chapter | Description | Takeaway |
|
The purpose of this chapter is to illustrate the typical life cycle of data and the stages (data exploration, data preparation, model development, and model deployment) involved to transform data into strategic insights using analytics. |
|
|
This purpose of this chapter is to provide the reader with the concept of in-database processing. In-database processing refers to the integration of advanced analytics into the database or data warehousing. With this capability, analytic processing is optimized to run where the data reside, in parallel, without having to copy or move the data for analysis. |
|
|
This purpose of this chapter is to provide the reader the concept of in-memory analytics. This latest innovation provides an entirely new approach to tackle big data by using an in-memory analytics engine to deliver super-fast responses to complex analytical problems. |
|
|
This purpose of this chapter is to explain the value of Hadoop. Organizations are faced with the unique big data challenges collecting more data than ever before, both structured and semi-structured data. There has never been a greater need for proactive and agile strategies to manage and integrate big data. |
|
|
This purpose of this chapter is to summarize and bring together the various technologies and concepts shared in Chapters 2–4. Combining traditional methods with modern and new approaches can save time and money for any organization. |
|
|
This purpose of this chapter is to conclude the book with the power of having an end-to-end data management and analytics platform for delivering data-driven decisions. It also provides final thoughts about the future of technologies. |
|
An organization's most valuable asset is its customers. Yet right next to customers are those precious assets that the enterprise can leverage to attract, retain, and interact with those valuable customers for profitable growth: your data. Every organization that I have encountered has huge, tidal waves of data—streaming in like waves from every direction—from multiple channels and a variety of sources. Data are everywhere—as far as the eye can see! All day, every day, data flow into and through the business and your database or data warehouse environment. Now, let's examine how all your data can be analyzed in an efficient and effective process to deliver data-driven decisions.
18.226.34.25