45
Views
2
CrossRef citations to date
0
Altmetric
Original Article

Exploring Attribute Correspondences Across Heterogeneous Databases by Mutual Information

&
Pages 305-336 | Published online: 08 Dec 2014
 

Abstract

Identifying attribute correspondences across heterogeneous databases is a critical and time-consuming step in integrating the databases. Past research has applied correlation analysis techniques to explore correspondences between attributes. These techniques, however, are appropriate for numeric attributes that are linearly related. This paper proposes an information-theoretic approach to exploring correspondences between attributes in heterogeneous databases. The proposed approach is applicable to character attributes, as well as to numeric attributes, regardless whether or not they are linearly related. It overcomes some serious shortcomings of previous approaches based on correlation analysis and has much broader applicability. The proposed procedure samples both matching and nonmatching pairs of records from the databases under consideration, applies matching functions to compare pairs of attributes, and then uses the mutual information to measure the dependency between a matching function as applied to a pair of attributes and the class (i.e., matching or nonmatching) of a pair of records. A high mutual information index implies a potential attribute correspondence, which is presented to the analyst for further evaluation. The paper also presents some empirical results demonstrating the utility of the proposed approach.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.