Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Data architecture

Data is one of the architecture domains that is the pillar of EA. The other pillars are the application, business, and infrastructure. Data architecture enables the designing and implementations of data resources. Data architecture enables the design, construction, and implementation of business-driven data entities that include real world entities.

The data architecture pillar artifact for the data design is leveraged in the implementation of the organization's physical database. Data architecture can be compared to a house architecture where the descriptions of sizes, materials, roofing, rooms, plumbing layout, and electrical structures are elucidated.

Figure 22: Logical data architecture

Examples for data reference models

The following is a list of various data models for various domains:

TMF SID Telecom	DODAF Logical Data Model	Energetics Epicenter data model
ARTS data model, Pipeline Open Data Standard (PODS) data model	Professional Petroleum Data Management (PPDM) Association data model

What are the Data principles ?

The following guidelines and rules are envisioned to be enduring and seldom modified, and support the organization in fulfilling its mission.

Principle	Summary
Data is an asset	Data is an asset having value and is managed accordingly within an enterprise.
Data is shared	Business users have access to data to perform their duties; therefore, data is shared across organizational LOBs.
Data is accessible	To perform their duties data is accessible to business users.
Data trustee	Each data element has a trustee accountable for data quality.
Common vocabulary and data definitions	Data is consistently defined across the enterprise, and these definitions are available to all business users.
Data security	Data protection is against unauthorized access and disclosure.

Table 4: Data principles

Probability indicator:

Describe the data modeling process

Beginning with the methodology:

The process starts with a selection of frameworks and models to support the views required, using the selected methodology. Examples of data models include the DODAF logical data model, the ARTS data model, the Pipeline Open Data Standard (PODS) data model, the Professional Petroleum Data Management (PPDM) association data model and the Energetics Epicenter data model. Confirm that all stakeholder requirements have been addressed. The recommended process for developing a data architecture is as follows:

Collate data models from existing business architecture and application architecture assets
Rationalize data requirements and align with enterprise data catalogs and models. Establish data inventory and ER models
Develop metrics and KPIs across the architecture by relating data to business services, business functions, processes, and applications
Elaborate data architecture by examining creation, distribution, migration, security, and archival

Probability indicator:

What are the key capabilities of data architecture?

The key top-level capabilities an organization needs to manage the data and information assets are listed as follows. This is referred to as the reference architecture for the data domain. The high-level capabilities for the data domain are:

Governance, quality, and life cycle management
Data security
BI and DWH
Data integration and ETL
MDM
Enterprise data model
Content management
Data infrastructure management
Figure 23: Information capability map

Probability indicator:

What do you understand by data quality? What are the various tools for data quality requirements?

Good quality data means that all data and master data is complete, consistent, accurate, time-stamped and industry standards-based. By improving the quality of data, an organization will be able to reduce costs, improve productivity and accelerate speed to market. A quality organizational data is the foundation to collaboration and synchronization. Data quality improvement involves quality assessment, quality design, quality transformation, and quality monitoring

The tool needed is: Oracle data quality.

Probability indicator:

What are the different backup and recovery strategies?

Strategies

Backup and recovery refer to the mechanisms and methodologies for protecting the organization's database against data loss and re-constructing the data after any such kind of data loss. A backup is a copy of data that can be leveraged to reconstruct that data. Backups are categorized into physical backups and logical backups:

Physical backups are backups of the physical files used in storing and recovering your databases, such as control files, data files, and redo logs. Every physical backup is a copy of some database information to a different location, either on disk or offline storage such as a tape.
Logical backups are logical data such as tables or stored procedures that are exported from a database with an export utility and stored in a binary format. This is later imported into a database using the import utility.

Physical backups are the foundation of a sound backup strategy. Logical backups are the useful add-ons to physical backups but are not fool-proof protection against data loss without physical backups.

The tool required is: Oracle data guard

Probability indicator:

What are the KPIs/KRAs data domain ?

The key performance indicators (KPIs) are like checkpoints for meeting an organization's objectives and goals. Monitoring KPIs will identify progress in terms of customer marketing, sales, and services goals. KPIs are a quantifiable measure leveraged to understand an organizational performance compared to an organizational goal. For a few goals, there could be many KPIs and they are often narrowed down to just two or three key data points known as KPIs. KPIs are those measurements that most accurately show whether a business is progressing toward its target goal/s. Some are listed as follows:

Transactions per sec ( performance, scalability)
Meantime of a transaction (performance, scalability)
Mean-latency time for a transaction (performance, scalability)
Daily mean latency time for a transaction (daily average, scalability)
Transactions per day (scalability)
Daily mean latency time for a transaction per transaction (d/e, scalability)
Total failed logins per day (security) and total failovers per day (availability)
Daily meantime for failover to a secondary service (availability)

Critical success factors

Critical success factors (CSFs) are the key variables or conditions that depict on how effectively and successfully an organization meets its strategic goals for the program. Businesses have to perform the CSF activities diligently in order to achieve their target objectives and retain a market lead:

Percentage of entities identified and leveraged for data models
Percentage of entities identified while delivering logical data models
Scalability of the data model
Design for performance
Data architecture has to be in sync with the business data model
Is data redundancy minimized to n percentage?

Probability indicator:

What are various data synchronization/integration capabilities? What are the tools that support data integration?

Capabilities

Data synchronization is the mechanism that enables consistency between a source and target storage and harmonizes the data. This is a key to different organizations, including file and mobile synchronizations. There are two different synchronization capabilities:

Real time: Enables real-time data integration and continuous data availability by delivering updates of critical information and providing continuous data synchronization between source and target environments. Companies gain improved intelligence across organizations by leveraging more accurate and timely data and increasing the uptime of the mission-critical applications. For example, Tool Oracle Golden Gate
Batch: Provides high-performance bulk data movement and data transformation capabilities for improved organizational performance and low TCO. Provides a heterogeneous platform support for data integration. Facilitates knowledge management, optimized productivity, extensibility, service-oriented data integration, and management for heterogeneous environments. For example, Tool Oracle Data Integrator

Probability indicator:

What are the different approaches for securing data?

Approaches

The following list describes various approaches to consider for securing stored data:

Design a tiered data protection mechanism and security model including multiple perimeter rings of defense to counter threats. Multiple layers of defense protect and isolate data in case one of the perimeters is compromised by threats.
Enable logical (encryption, authorization, and authentication) and physical (locks and restricted access to the server, networking and storage cabinets) security.
Logical security includes firewalls, DMZs, virus-detection, and antispyware for servers and storage systems. Storage security strategy won't be complete without making sure the applications, databases, filesystems and OS are secure to prevent unauthorized access to critical data.
Storage and networking tools will facilitate passwords change management at initial installation and on an ongoing basis. Restrict access to tools to those who authorize to leverage them for the organization.
Techniques to protect data while in-flight include encryption, virtual private networks, and the IPSec protocol.
Consider different levels of data encryption to counter applicable threats. Enable key management in the environment.

Probability indicator:

What is a data warehouse? What are the benefits of data warehouses?

Data warehouses facilitate reporting on key business processes aka KPIs. Data warehouses help integrate data from various sources and show single-point-of-truth about business metrics. Data warehouses are also leveraged for data mining which helps in pattern recognition, forecasts, trend prediction, and so on. Data warehouses integrate various data sources and archives them to be able to analyze the business, including performance analysis, trends, and predictions, and to leverage results to improve the business efficiencies.

A data warehouse is a data repository, which is leveraged for a management decision system. Data warehouses consist of a wide variety of data that have a high level of business conditions at a single point in time. This is a repository of integrated data available for queries and analysis. The following are the different types of data warehousing:

Enterprise data warehousing
Operational data store
Datamart

A data warehouse stores data for analyzing, where as OLAP is leveraged for analyzing the data, managing aggregations and partitioning. Data marts are designed for one domain. An organization may deploy different data marts pertaining to different departments such as HR finance, R&D, and so on. These data marts are built on top of warehouses. A data mart is a specialized category of data warehousing, and contains a snapshot of operational data, helping business stakeholders to analyze trends.

Probability indicator:

What is the differences between OLTP and OLAP ?

Let's look at the differences:

On-line Transaction Processing (OLTP) is differentiated by a lot of short on-line transactions: UPDATE, INSERT, DELETE. The main emphasis of OLTP is on fast query processing, ensuring data integrity in a multi-access landscape and effectiveness measured in terms of transactions per second. In an OLTP database there is detailed current data and schema leveraged to store transactional databases in the 3NF entity model.
On-line Analytical Processing (OLAP) is differentiated by a relatively less volume of transactions. Queries are often complex and involve aggregations. In OLAP a response time is an effectiveness measure. OLAP applications are leveraged by data mining solutions. In an OLAP database there is aggregated, historical data, stored in multi-dimensional star schemes.

Probability indicator:

What is the differences between big data and BI

Let's take a look at the differences.

Big data is a vast amount of data from various sources generated by various applications, appliances, and systems. One has to perform a lot of cleaning, aggregation, crunching of this data, and also will need to run various algorithms based on the objective of these analysis.

Business Intelligence (BI) is a set of tools and technology that enables you to analyze, report, and visualize data, and provides functions such as deep dive, slice and dice, and other related function. It may or may not use big data as a source of its data for analysis. BI is also known as a Decision Support System (DSS) which refers to the technologies and practices for collection, integration, and analysis of the business related data.

Three key points where big data is different from analytics are outlined in the following table:

Data Point	Description
Volume	The global quantity of digital data will grow from 130 Exabytes to 40,000 Exabytes by 2020. A petabyte is one quadrillion byte or equivalent of 20 million cabinets of text.
Velocity	The speed of data is even more critical than the volume. Real-time access to information enables organizations to make quicker decisions and stay ahead of the competition.
Variety	Big data comes in different shapes and forms. It can come in the form of images on Facebook, e-mails, text messages, GPS signals, tweets, and other social media updates. These forms of data are known as unstructured data.

Table 5: Big Data Attributes

Probability indicator:

What are Hadoop and MapReduce ?

Hadoop is not a database, but a software ecosystem that allows massive parallel computing. It is an enabler to types of NoSQL distributed databases, which allows data to be distributed across thousands of nodes with little reduction in performance.

A staple of the Hadoop ecosystem is MapReduce, a computational model that basically takes data intensive processes and distribute the computation across an endless number of nodes referred to as a cluster. It is a game-changer in providing the enormous processing needs of big data; a large data process might take several hours of processing time on a centralized relational database system, but may only take a few minutes across a large Hadoop cluster where all processing is in parallel.

NoSQL

NoSQL represents a different framework of databases that allows high-performance, processing of data on a massive scale. NoSQL is a database infrastructure that has been well-adapted to the demands of big data. NoSQL achieves efficiency because unlike RDMS that is highly structured, NoSQL is unstructured in nature, trading off stringent integrity requirements for agility and speed. NoSQL is based on the concept of distributed databases, where unstructured data may be stored across multiple nodes. This distributed architecture allows NoSQL databases to be horizontally scalable, as data continues to explode, with no reductions in performance. The NoSQL database infrastructure has been the solution to handling some of the biggest data warehouses on the planet, such as the likes of Amazon and Google.

Probability indicator:

What tools and techniques have you used to manage enterprise data and data architecture artifacts?

The below list explains the tools and techniques to manage enterprise data artifacts:

The ability to create data usability proposals such as data integration data cleansing tools, data dictionaries, master data management, and data warehouses
Sound knowledge of data architecture approaches, standards and best practices
Strong technical competencies with data technologies (for example, MDM, data warehouses, data marts, DBMS, and BI)
Practical expertise in design approaches for data warehouses
Expert data modeling skills such as conceptual, logical, and physical modeling
Expertise in ETL concepts, Cognos, and OBIEE tools
Expertise in MDM

Probability indicator:

What is ETL?

ETL stands for Extract, Transform, and Load. ETL is leveraged to read data from a specified source and extract a desired subset of data. Next, it transforms the data using rules and tables and converts it to the target state and resulting data is loaded into the target database.

Probability indicator:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Data architecture

Create new playlist

Sign In

Sign Up

Data architecture

What are the Data principles ?

Describe the data modeling process

What are the key capabilities of data architecture?

What do you understand by data quality? What are the various tools for data quality requirements?

What are the different backup and recovery strategies?

What are the KPIs/KRAs data domain ?

What are various data synchronization/integration capabilities? What are the tools that support data integration?

What are the different approaches for securing data?

What is a data warehouse? What are the benefits of data warehouses?

What is the differences between OLTP and OLAP ?

What is the differences between big data and BI

What are Hadoop and MapReduce ?

What tools and techniques have you used to manage enterprise data and data architecture artifacts?

What is ETL?

Table of Contents for
Data architecture