Data is one of the architecture domains that is the pillar of EA. The other pillars are the application, business, and infrastructure. Data architecture enables the designing and implementations of data resources. Data architecture enables the design, construction, and implementation of business-driven data entities that include real world entities.
The data architecture pillar artifact for the data design is leveraged in the implementation of the organization's physical database. Data architecture can be compared to a house architecture where the descriptions of sizes, materials, roofing, rooms, plumbing layout, and electrical structures are elucidated.
Examples for data reference models
The following is a list of various data models for various domains:
TMF SID Telecom |
DODAF Logical Data Model |
Energetics Epicenter data model |
ARTS data model, Pipeline Open Data Standard (PODS) data model |
Professional Petroleum Data Management (PPDM) Association data model |
The following guidelines and rules are envisioned to be enduring and seldom modified, and support the organization in fulfilling its mission.
Principle |
Summary |
Data is an asset |
Data is an asset having value and is managed accordingly within an enterprise. |
Data is shared |
Business users have access to data to perform their duties; therefore, data is shared across organizational LOBs. |
Data is accessible |
To perform their duties data is accessible to business users. |
Data trustee |
Each data element has a trustee accountable for data quality. |
Common vocabulary and data definitions |
Data is consistently defined across the enterprise, and these definitions are available to all business users. |
Data security |
Data protection is against unauthorized access and disclosure. |
Table 4: Data principles
Probability indicator:
Beginning with the methodology:
The process starts with a selection of frameworks and models to support the views required, using the selected methodology. Examples of data models include the DODAF logical data model, the ARTS data model, the Pipeline Open Data Standard (PODS) data model, the Professional Petroleum Data Management (PPDM) association data model and the Energetics Epicenter data model. Confirm that all stakeholder requirements have been addressed. The recommended process for developing a data architecture is as follows:
Probability indicator:
The key top-level capabilities an organization needs to manage the data and information assets are listed as follows. This is referred to as the reference architecture for the data domain. The high-level capabilities for the data domain are:
Probability indicator:
Good quality data means that all data and master data is complete, consistent, accurate, time-stamped and industry standards-based. By improving the quality of data, an organization will be able to reduce costs, improve productivity and accelerate speed to market. A quality organizational data is the foundation to collaboration and synchronization. Data quality improvement involves quality assessment, quality design, quality transformation, and quality monitoring
The tool needed is: Oracle data quality.
Probability indicator:
Strategies
Backup and recovery refer to the mechanisms and methodologies for protecting the organization's database against data loss and re-constructing the data after any such kind of data loss. A backup is a copy of data that can be leveraged to reconstruct that data. Backups are categorized into physical backups and logical backups:
Physical backups are the foundation of a sound backup strategy. Logical backups are the useful add-ons to physical backups but are not fool-proof protection against data loss without physical backups.
The tool required is: Oracle data guard
Probability indicator:
The key performance indicators (KPIs) are like checkpoints for meeting an organization's objectives and goals. Monitoring KPIs will identify progress in terms of customer marketing, sales, and services goals. KPIs are a quantifiable measure leveraged to understand an organizational performance compared to an organizational goal. For a few goals, there could be many KPIs and they are often narrowed down to just two or three key data points known as KPIs. KPIs are those measurements that most accurately show whether a business is progressing toward its target goal/s. Some are listed as follows:
Critical success factors
Critical success factors (CSFs) are the key variables or conditions that depict on how effectively and successfully an organization meets its strategic goals for the program. Businesses have to perform the CSF activities diligently in order to achieve their target objectives and retain a market lead:
Probability indicator:
Capabilities
Data synchronization is the mechanism that enables consistency between a source and target storage and harmonizes the data. This is a key to different organizations, including file and mobile synchronizations. There are two different synchronization capabilities:
Probability indicator:
Approaches
The following list describes various approaches to consider for securing stored data:
Probability indicator:
Data warehouses facilitate reporting on key business processes aka KPIs. Data warehouses help integrate data from various sources and show single-point-of-truth about business metrics. Data warehouses are also leveraged for data mining which helps in pattern recognition, forecasts, trend prediction, and so on. Data warehouses integrate various data sources and archives them to be able to analyze the business, including performance analysis, trends, and predictions, and to leverage results to improve the business efficiencies.
A data warehouse is a data repository, which is leveraged for a management decision system. Data warehouses consist of a wide variety of data that have a high level of business conditions at a single point in time. This is a repository of integrated data available for queries and analysis. The following are the different types of data warehousing:
A data warehouse stores data for analyzing, where as OLAP is leveraged for analyzing the data, managing aggregations and partitioning. Data marts are designed for one domain. An organization may deploy different data marts pertaining to different departments such as HR finance, R&D, and so on. These data marts are built on top of warehouses. A data mart is a specialized category of data warehousing, and contains a snapshot of operational data, helping business stakeholders to analyze trends.
Probability indicator:
Let's look at the differences:
UPDATE
, INSERT
, DELETE
. The main emphasis of OLTP is on fast query processing, ensuring data integrity in a multi-access landscape and effectiveness measured in terms of transactions per second. In an OLTP database there is detailed current data and schema leveraged to store transactional databases in the 3NF entity model.Probability indicator:
Let's take a look at the differences.
Big data is a vast amount of data from various sources generated by various applications, appliances, and systems. One has to perform a lot of cleaning, aggregation, crunching of this data, and also will need to run various algorithms based on the objective of these analysis.
Business Intelligence (BI) is a set of tools and technology that enables you to analyze, report, and visualize data, and provides functions such as deep dive, slice and dice, and other related function. It may or may not use big data as a source of its data for analysis. BI is also known as a Decision Support System (DSS) which refers to the technologies and practices for collection, integration, and analysis of the business related data.
Three key points where big data is different from analytics are outlined in the following table:
Data Point |
Description |
Volume |
The global quantity of digital data will grow from 130 Exabytes to 40,000 Exabytes by 2020. A petabyte is one quadrillion byte or equivalent of 20 million cabinets of text. |
Velocity |
The speed of data is even more critical than the volume. Real-time access to information enables organizations to make quicker decisions and stay ahead of the competition. |
Variety |
Big data comes in different shapes and forms. It can come in the form of images on Facebook, e-mails, text messages, GPS signals, tweets, and other social media updates. These forms of data are known as unstructured data. |
Table 5: Big Data Attributes
Probability indicator:
Hadoop is not a database, but a software ecosystem that allows massive parallel computing. It is an enabler to types of NoSQL distributed databases, which allows data to be distributed across thousands of nodes with little reduction in performance.
A staple of the Hadoop ecosystem is MapReduce, a computational model that basically takes data intensive processes and distribute the computation across an endless number of nodes referred to as a cluster. It is a game-changer in providing the enormous processing needs of big data; a large data process might take several hours of processing time on a centralized relational database system, but may only take a few minutes across a large Hadoop cluster where all processing is in parallel.
NoSQL
NoSQL represents a different framework of databases that allows high-performance, processing of data on a massive scale. NoSQL is a database infrastructure that has been well-adapted to the demands of big data. NoSQL achieves efficiency because unlike RDMS that is highly structured, NoSQL is unstructured in nature, trading off stringent integrity requirements for agility and speed. NoSQL is based on the concept of distributed databases, where unstructured data may be stored across multiple nodes. This distributed architecture allows NoSQL databases to be horizontally scalable, as data continues to explode, with no reductions in performance. The NoSQL database infrastructure has been the solution to handling some of the biggest data warehouses on the planet, such as the likes of Amazon and Google.
Probability indicator:
The below list explains the tools and techniques to manage enterprise data artifacts:
Probability indicator:
ETL stands for Extract, Transform, and Load. ETL is leveraged to read data from a specified source and extract a desired subset of data. Next, it transforms the data using rules and tables and converts it to the target state and resulting data is loaded into the target database.
Probability indicator:
3.142.200.226