Data integration

Data integration combines data from multiple sources to form a coherent data store. The common issues here are as follows:

  • Heterogeneous data: This has no common key
  • Different definition: This is intrinsic, that is, same data with different definition, such as a different database schema
  • Time synchronization: This checks if the data is gathered under same time periods
  • Legacy data: This refers to data left from the old system
  • Sociological factors: This is the limit of data gathering

There are several approaches that deal with the above issues:

  • Entity identification problem: Schema integration and object matching are tricky. This referred to as the entity identification problem.
  • Redundancy and correlation analysis: Some redundancies can be detected by correlation analysis. Given two attributes, such an analysis can measure how strongly one attribute implies the other, based on the available data.
  • Tuple Duplication: Duplication should be detected at the tuple level to detect redundancies between attributes
  • Data value conflict detection and resolution: Attributes may differ on the abstraction level, where an attribute in one system is recorded at a different abstraction level
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.237.31