Identification of duplicate data , deduplication and merge duplicates

Duplicate Check

Duplicate check

The results from the distributed processing of initial databases repeatedly entered addresses means that in the additional fields for the same customer redundant or differing information are available . The objective of a marketing database , "" to provide customers with all the necessary information,"" to know , is missed. Using various automatic and heuristic methods such as knowledge-based tables, fuzzy logic, phonetic comparisons, Bi - and tri-string comparisons and acronyms treatment can be identified and merge that represent the same object in the real world data sets.

Typical types of duplicates are company name, relocation and contact person duplicates within the region for business information. For private data , these are domestic, personal , marriage and relocation duplicates.

The merging of duplicates allows the consolidation of the data set as well as the integral view of a customer. Objectives are :

the determination of revenues and activity per customer / household
the provisioning of acquisition activities, eg for new customers only
The accumulation of external information, no purchase of redundant information
Consolidation of different source systems
Negative test of certificate revocation lists