Record Linkage References

Why is this important

  • [RECOMMENDED] Goth G. Running on EMPI. Health information exchanges and the ONC keep trying to find the secret sauce of patient matching. Health data management. 2014;22(2):52-, 4, 6 passim.

Detailed survey in computer science

  • [RECOMMENDED] Ahmed K. Elmagarmid, Panagiotis G. Ipeirotis, Vassilios S. Verykios. Duplicate Record Detection: A Survey. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 19 JANUARY 2007
  • [RECOMMENDED] M. Elfeky, V. Verykios, A. Elmagarmid. TAILOR: A Record Linkage Tool Box. In Proceedings of the 18th International Conference on Data Engineering (ICDE 2002). IEEE Computer Society, Washington, DC, USA
  • N. Koudas, S. Sarawagi, and D. Srivastava. Record linkage: similarity measures and algorithms. In Proceedings of the 2006 ACM SIGMOD international conference on Management of data (SIGMOD ’06). ACM, New York, NY, USA, 802-803. DOI=10.1145/1142473.1142599 http://doi.acm.org.libproxy.lib.unc.edu/10.1145/1142473.1142599

What is actually done in the field

  • [RECOMMENDED] S. Weber, H. Lowe, A. Das, et al. A simple heuristic for blindfolded record linkage. J Am Med Inform Assoc. 2012.
  • [RECOMMENDED] F. Boscoe, D. Schrag, K. Chen, et al. Building capacity to assess cancer care in the Medicaid population in New York State. Health Services Research 2011;46(3): 805-20
  • https://www.census.gov/srd/papers/pdf/rrs2006-02.pdf

Private Record Linkage

  • [RECOMMENDED]Rob Hall and Stephen E. Fienberg: Privacy-Preserving Record Linkage. Privacy in Statistical Databases 2010: Lecture Notes in Computer Science, 2011, Volume 6344/2011, pp 269-283, DOI: 10.1007/978-3-642-15838-4_24.
  • Vatsalan, D., Christen, P., & Verykios, V. S. (2013). A taxonomy of privacy-preserving record linkage techniques. Information Systems, 38(6), 946-969
  • L. Bonomi, L. Xiong, J. Lu. LinkIT: Privacy Preserving Record Linkage and Integration via Transformations (demo track). In SIGMOD, 2013
  • http://hiplab.mc.vanderbilt.edu/projects/soempi/ (most recent work in the field)
  • A. Inan, M. Kantarcioglu, E. Bertino, and M. Scannapieco. A hybrid approach to private record linkage. In ICDE, pp 496-505. IEEE, 2008
  • T. Churches and P. Christen. Blind data linkage using n-gram similarity comparisons. In H. Dai, R. Srikant, and C. Zhang, editors, PAKDD, volume 3056 of Lecture Notes in Computer Science, pp 121-126. Springer, 2004

Recent papers based on data mining and machine learning techniques

  • McCoy AB, Wright A, Kahn MG, Shapiro JS, Bernstam EV, Sittig DF. Matching identifiers in electronic health records: implications for duplicate records and patient safety. Bmj Quality & Safety. Mar 2013;22(3):219-224.
  • Peter Christen. 2008. Automatic Record Linkage using Seeded Nearest Neighbor and Support Vector Machine Classification. Proceedings of the ACM SIGKDD 2008 conference, Las Vegas, August 2008.
  • Sunita Sarawagi and Anuradha Bhamidipaty. 2002. Interactive deduplication using active learning. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD ’02). ACM, New York, NY, USA, 269-278. DOI=10.1145/775047.775087 http://doi.acm.org/10.1145/775047.775087
  • Bilenko, M.; Kamath, B.; Mooney, R.J.; , “Adaptive Blocking: Learning to Scale Up Record Linkage,” Data Mining, 2006. ICDM ’06. Sixth International Conference on , vol., no., pp.87-96, 18-22 Dec. 2006
    doi: 10.1109/ICDM.2006.13

Corner Stone Papers for Probabilistic Record Linkage

  • H. B. Newcombe, J. M. Kennedy, S. J. Axford, and A. P. James. Automatic Linkage of Vital Records, Science, 130, pp. 954-959. 1959
  • I. P. Fellegi and A. B. Sunter. A theory for record linkage. Journal of the American Statistical Association 1969;64: pp 1183–1210

Papers that look at the impact of record linkage on analysis

  • I. Baldi, A. Ponti, R. Zanetti, G. Ciccone, F. Merletti, and D. Gregori. The impact of record-linakge bias in the Cox model. Journal of Evaluation in Clinical Practice. 16: 92-96. 2010.
  • P. Lahiri and M. Larsen. Regression analysis with linked data. Journal of the American Statistical Association, 100(469):222-230, March 2005
  • F. Scheuren and W. E. Winkler. Regression Analysis of Data Files That Are Computer Matched – Part II. Survey Methodology, 23, 157-165. 1997.

Available Software

  • P. Jurczyk, J. J. Lu, L. Xiong, J. D. Cragan, A. Correa, FRIL: A Tool for Comparative Record Linkage, American Medical Informatics Associations (AMIA) 2008 Annual Symposium
  • Febrl
  • Linkagewiz. http://www.linkagewiz.com/index.htm
  • K. Campbell, D. Deck, and A. Krupski. 2008. Record linkage software in the public domain: a comparison of Link Plus, The Link King, and a `basic’ deterministic algorithm. Health Informatics Journal March 2008 vol. 14 no. 1 5-15

Two CS faculty who focus on record linkage