0%

Choose the right Azure data service and correct model design for successful implementation of your data model with the help of this hands-on guide

Key Features

  • Design a cost-effective, performant, and scalable database in Azure
  • Choose and implement the most suitable design for a database
  • Discover how your database can scale with growing data volumes, concurrent users, and query complexity

Book Description

Data is at the heart of all applications and forms the foundation of modern data-driven businesses. With the multitude of data-related use cases and the availability of different data services, choosing the right service and implementing the right design becomes paramount to successful implementation.

Data Modeling for Azure Data Services starts with an introduction to databases, entity analysis, and normalizing data. The book then shows you how to design a NoSQL database for optimal performance and scalability and covers how to provision and implement Azure SQL DB, Azure Cosmos DB, and Azure Synapse SQL Pool. As you progress through the chapters, you'll learn about data analytics, Azure Data Lake, and Azure SQL Data Warehouse and explore dimensional modeling, data vault modeling, along with designing and implementing a Data Lake using Azure Storage. You'll also learn how to implement ETL with Azure Data Factory.

By the end of this book, you'll have a solid understanding of which Azure data services are the best fit for your model and how to implement the best design for your solution.

What you will learn

  • Model relational database using normalization, dimensional, or Data Vault modeling
  • Provision and implement Azure SQL DB and Azure Synapse SQL Pools
  • Discover how to model a Data Lake and implement it using Azure Storage
  • Model a NoSQL database and provision and implement an Azure Cosmos DB
  • Use Azure Data Factory to implement ETL/ELT processes
  • Create a star schema model using dimensional modeling

Who this book is for

This book is for business intelligence developers and consultants who work on (modern) cloud data warehousing and design and implement databases. Beginner-level knowledge of cloud data management is expected.

Table of Contents

  1. Data Modeling for Azure Data Services
  2. Contributors
  3. About the author
  4. About the reviewers
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Download the example code files
    5. Download the color images
    6. Conventions used
    7. Get in touch
    8. Share Your Thoughts
  6. Section 1 – Operational/OLTP Databases
  7. Chapter 1: Introduction to Databases
    1. Overview of relational databases
    2. Files
    3. Relational databases
    4. Introduction to Structured Query Language
    5. Different categories of SQL
    6. Understanding the database schema
    7. Impact of intended usage patterns on database design
    8. Understanding relational theory
    9. Pillar 1 – Elements of a set are not ordered
    10. Pillar 2 – All elements in a set are unique
    11. Keys
    12. Types of keys
    13. Choosing the primary key
    14. Integrity
    15. The Check and Unique constraints
    16. Types of workload
    17. OLTP
    18. OLAP
    19. Summary
  8. Chapter 2: Entity Analysis
    1. Scope
    2. Project scope
    3. Product scope
    4. Understanding entity relationship diagrams
    5. Entities
    6. Understanding super- and sub-entities
    7. Naming entities
    8. Relationships
    9. Types of relationships
    10. Drawing conventions
    11. Recap
    12. Creating your first ERD
    13. Context of an ERD
    14. Summary
    15. Exercises
    16. Exercise 1 – student registration
    17. Exercise 2 – airline
  9. Chapter 3: Normalizing Data
    1. When to use normalization as a design strategy
    2. Considering all the details
    3. Preventing redundancy
    4. How to avoid redundancy
    5. The normalization steps
    6. Step zero
    7. First normal form
    8. Second normal form
    9. Third normal form
    10. Boyce-Codd and the fourth normal form
    11. Normalizing – a recap
    12. An alternative approach to normalizing data
    13. Step 1
    14. Step 2
    15. Step 3
    16. Step 4
    17. Integrating separate results
    18. Entity relationship diagram
    19. Summary
    20. Exercises
    21. Exercise 1 – Stock management of a bicycle shop
  10. Chapter 4: Provisioning and Implementing an Azure SQL DB
    1. Technical requirements
    2. Understanding SQL Server data types
    3. Numerical data
    4. Alphanumerical data
    5. Varying-length data types
    6. Dates
    7. Other data types
    8. Quantifying the data model
    9. Estimating the database size
    10. Analyzing expected usage patterns
    11. Provisioning an Azure SQL database
    12. Provisioned versus serverless
    13. vCores versus DTU
    14. Hyperscale and Business Critical
    15. Elastic pool
    16. Networking
    17. Additional settings
    18. Tags
    19. Review + create
    20. Connecting to the database
    21. Azure portal
    22. Azure Data Studio
    23. Data definition language
    24. Creating a table
    25. Altering a table
    26. Dropping a table
    27. Inserting data
    28. Indexing
    29. Clustered index
    30. Nonclustered index
    31. Automatic tuning
    32. Summary
  11. Chapter 5: Designing a NoSQL Database
    1. Understanding big data
    2. Understanding big data clusters
    3. Partitioning
    4. Getting to know Cosmos DB
    5. JSON
    6. Modeling JSON
    7. Using embedding versus referencing
    8. Referring to objects
    9. Cosmos DB partitioning
    10. Putting it together
    11. Key-value databases
    12. Modeling key-value databases
    13. Other NoSQL databases
    14. Gremlin
    15. Cassandra
    16. Extra considerations
    17. Polyglot persistence
    18. Concurrency
    19. Summary
    20. Exercise
  12. Chapter 6: Provisioning and Implementing an Azure Cosmos DB Database
    1. Technical requirements
    2. Provisioning a Cosmos DB database
    3. Basics
    4. Networking
    5. Backup policy
    6. Encryption
    7. Creating a container
    8. Uploading documents to a container
    9. Cosmos DB container settings
    10. Importing data using the Azure Cosmos DB Data Migration tool
    11. Summary
  13. Section 2 – Analytics with a Data Lake and Data Warehouse
  14. Chapter 7: Dimensional Modeling
    1. Background to dimensional modeling
    2. Performance
    3. Consistency
    4. Data quality
    5. The complexity of normalized database schemas
    6. Lack of historical data
    7. Understanding dimensional modeling
    8. Minimizing redundancy
    9. Using dependencies between attributes
    10. Understanding star schemas
    11. Understanding fact tables
    12. Understanding dimension tables
    13. Steps in dimensional modeling
    14. Choosing a process and defining the scope
    15. Determining the needed grain
    16. Determining the dimensions
    17. Determining the facts
    18. Designing dimensions
    19. Defining the primary key of a dimension table
    20. Adding an unknown member
    21. Creating star schemas versus creating snowflake schemas
    22. Implementing a date dimension
    23. Slowly changing dimensions
    24. Junk dimension
    25. Degenerate dimension
    26. Designing fact tables
    27. Understanding additive facts
    28. Understanding semi-additive facts
    29. Understanding non-additive facts
    30. Understanding transactional fact tables
    31. Understanding periodic snapshot fact tables
    32. Understanding accumulating snapshot fact tables
    33. Understanding the roleplaying dimension
    34. Using a coverage fact table
    35. Using a Kimball data warehouse versus data marts
    36. Summary
    37. Exercise
  15. Chapter 8: Provisioning and Implementing an Azure Synapse SQL Pool
    1. Overview of Synapse Analytics
    2. Introducing SQL pools
    3. Introducing Spark pools
    4. Introducing data integration
    5. Provisioning a Synapse Analytics workspace
    6. Creating a dedicated SQL pool
    7. Implementing tables in Synapse SQL pools
    8. Using hash distribution
    9. Using replicated distribution
    10. Using ROUND_ROBIN distribution
    11. Implementing columnstore indexes
    12. Understanding workload management
    13. Creating a workload group
    14. Creating a workload classifier
    15. Using PolyBase to load data
    16. Enabling a SQL pool to access a data lake account
    17. Configuring and using PolyBase
    18. Using CTAS to import data
    19. Using COPY to import data
    20. Connecting to and using a dedicated SQL pool
    21. Working with Azure Data Studio
    22. Working with Power BI
    23. Summary
  16. Chapter 9: Data Vault Modeling
    1. Background to Data Vault modeling
    2. Designing Hub tables
    3. Defining the business key
    4. Implementing a hash key
    5. Adding the load date
    6. Adding the name of the source system
    7. Adding optional columns
    8. Designing Link tables
    9. Designing Satellite tables
    10. Adding optional columns to a Satellite
    11. Choosing the number of Satellites to use
    12. Using hash keys
    13. Designing a Data Vault structure
    14. Choosing the Hubs
    15. Choosing the Links
    16. Choosing the Satellites
    17. Designing business vaults
    18. Adding a Meta Mart
    19. Adding a Metrics Vault
    20. Adding an Error Mart
    21. Using Point-in-Time tables
    22. Adding Bridge tables
    23. Adding a hierarchical link
    24. Implementing a Data Vault
    25. Summary
    26. Exercise
  17. Chapter 10: Designing and Implementing a Data Lake Using Azure Storage
    1. Technical requirements
    2. Background of data lakes
    3. Modeling a data lake
    4. Defining data lake zones
    5. Defining a data lake folder structure
    6. Designing time slices
    7. Using different file formats
    8. AVRO file format
    9. Parquet file format
    10. ORC file format
    11. Choosing the proper file size
    12. Provisioning an Azure storage account
    13. Locally redundant storage (LRS)
    14. Zone-redundant storage (ZRS)
    15. Geo-redundant storage (GRS) and geo-zone-redundant storage (GZRS)
    16. Read-access geo-redundant storage (RA_GRS) and read-access geo-zone-redundant storage (RA_GZRS)
    17. Creating a data lake filesystem
    18. Creating multiple storage accounts
    19. Considering DTAP
    20. Considering data diversity
    21. Considering cost sensitivity
    22. Considering management overhead
    23. Summary
  18. Section 3 – ETL with Azure Data Factory
  19. Chapter 11: Implementing ETL Using Azure Data Factory
    1. Technical requirements
    2. Introducing Azure Data Factory
    3. Introducing the main components of Azure Data Factory
    4. Understanding activities
    5. Understanding datasets
    6. Understanding linked services
    7. Understanding pipelines
    8. Understanding triggers and integration runtimes
    9. Using the copy activity
    10. Copying a single table to the data lake
    11. Copying all tables to the data lake
    12. Implementing a data flow
    13. Executing SQL code from Data Factory
    14. Summary
    15. Why subscribe?
  20. Other Books You May Enjoy
    1. Packt is searching for authors like you
    2. Share Your Thoughts
54.211.203.45