News and Announcements

All Annoucements

  • Our paper, Enhancing Privacy through an Interactive On-demand Incremental Information Disclosure Interface: Applying Privacy-by-Design to Record Linkage, has been accepted at the Fifteenth Symposium on Usable Privacy and Security (SOUPS 2019). 23% (=27/119 acceptance rate).
  • Our paper, Balancing Privacy and Information Disclosure in Interactive Record Linkage with Visual Masking, won the CHI2018 Honourable Mention Award (top 5% of all submissions).
  • Dr. Hye-Chung Kum has been approved for a $1 million funding award by the Patient-Centered Outcomes Research Institute (PCORI) to study privacy enhanced techniques for interactive record linkage that would improve both privacy protection and accuracy in integrating heterogeneous data.
  • In the News

    Project Overview

    Secondary data (e.g. hospital discharge and insurance claims data) are now routinely used to conduct research that can inform improvements in health care delivery. Although many methodologies have been developed to link records from heterogeneous datasets, the nature of healthcare data is redundant, incomplete or erroneous, making this procedure very difficult. Record linkage is necessary to conduct comprehensive research, but human involvement and interaction is crucial during this process. However, human interaction brings the issue of privacy forefront and it is impossible to obtain the consent needed. The common practice is to obtain appropriate approval from the institutional review board (IRB) for waiver of consent, then to allow a trusted party to link the data with full access to the data, often using in house developed code. Currently, there is no software that can provide both privacy and high quality linkage. The goals of this project are to design a software that can support high quality data integration for health research in a privacy enhanced manner.

    Updates

    From September 1,2019 to February 29,2020

    During this period, we facilitated and completed the final survey to evaluate the Privacy Statement document (Aim 3) through an online platform with data from two survey panels and more than 500 participants. The majority of the participants found our document useful and preferable to a traditional format. Participants’ feedback has been important to evaluate the document in a real-world setting and to plan to reflect some of their suggestions. We also successfully held two Combined Committee Meeting (Investigators and Stakeholders) during this study period.

    For Aims 1 & 2, we used MINDFIRL to deduplicate the ArthritisPower data with 18,240 unique patients ids and we generated 1,055 unique pairs based on records with identical matching on multiple variables. We then used our random forest trained model, from the UTH study for automatic record linkage, on these pairs to determine the 187 uncertain pairs that required manual review by people. We were able to resolve all but one pair after manual reviews.

    We presented our poster at the annual AMIA conference in November, 2019. We are going to present a poster on developing the automatic record linkage algorithm to the AMIA 2020 Informatics Summit. We also submitted our results from our ELSI Delphi study to the AcademyHealth Annual Research Meeting which will be held June 13-16, 2020 in Boston, MA.

    From March 1, 2019 to August 31, 2019

    During this period, we facilitated and completed two Delphi studies with nationally recruited ELSI experts and patients who represented diverse sociodemographic and clinical profiles (Aim 3). Participants have had a significant impact on the project in that they have helped the research team to identify issues, redundancies, unclear wording, and gaps that existed in the initial draft of the documents. We successfully drafted and finalized the documents that will accompany MINDFIRL for studies using record linkage once the software is released.The final drafts of the documents received positive feedback from the majority of patients and ELSI expert study participants. For Aims 1 & 2, we used our software to conduct a linkage study using a University of Texas at Houston dataset. We used 20,000 pairs of records of real data;10,000 for training our models and 10,000 for testing. Our model classified data into 3 classes based on the linkage process: Matched, Uncertain, and No-match. There were 2 teams participating in this study, with 4 reviewers each, who were trained students/staff. Only 303 pairs needed manual review and were resolved by consensus among reviewers, indicating the effectiveness of MINDFIRL in conducting research that requires record linkage, without fully disclosing private information.

    We presented our results from the General User Study (D8) at the AcademyHealth Annual Research Meeting conference in June, 2019.We also presented the technical results at SOUPS 2019 in August. Our AMIA poster submission on quantifying the risk has been accepted and we will present it in November, 2019. We also submitted our work on developing the automatic record linkage algorithm from the summative evaluation (F2) to the AMIA 2020 Informatics Summit which will be held March 23 – March 26, 2020 in Houston, Texas.

    From September 1, 2018 to February 28, 2018

    During this period, there was a lot of software development to finalize all details for the first internal release of our software. We will use this software to conduct a linkage study using a UT Houston dataset of 20,000 potential pairs of real data composed of eight fields that are most often present in patient records (i.e., first name, middle name, last name, date of birth, social security number, gender, primary address and primary phone number). Our initial automatic record linkage of the 20,000 pairs resulted in 300 pairs that needed manual review using MINDFIRL. We are now working on deciding which of the 8 available fields is best suited to be used for manual review since the prototype software cannot handle all 8. We have also decided to change the name of the software from SDLink (Secured Decouple Linkage) to MINDFIRL (MInimum Necessary Disclosure for Interactive RL). The first internal release of the software has been made on github.

    For our three-round Delphi study to develop a prototype IRB application template (which will accompany our software), the research team pilot-tested the Delphi instrument which was primarily based on the results from the NGT sessions, with Ethical, Legal, Social Implication (ELSI) experts. We have now completed the first 2 rounds of the Delphi study, and we are finalizing the third round.

    We presented our results from the NGT sessions and the General User Study at the APHA and the AER conferences in November, 2018. We have also submitted a paper to SOUPS (Symposium on Usable Privacy and Security) 2019, a poster to the AcademyHealth Annual Research Meeting in June, 2019. The research team plans to disseminate these results in peer-reviewed publications.

    From March 1,2018 to August 31,2018

    Our team presenting at Conference on Human Factors in Computing Systems (CHI) 2018. Click here for video.

    Findings from the ELSI experts and patient NGT sessions indicate that enhancing and communicating privacy protection can eliminate existing barriers in the execution of research protocols and can enhance transparency and public trust in the scientific community. Evidence from our ELSI experts sessions suggests that the main benefit of using the PPIRL framework for database record linkage are the potential to facilitate the execution of research protocols (e.g. providing a tool for research to link and de-identify data in a privacy enhance way). ELSI experts perceived the need for evidence on the validity of the record linkage process, the administrative controls, and data governance structures to be necessary for approving IRB applications for studies using our framework. Patients ranked minimum disclosure and comprehensive data protection as the most important factors related to the use of our framework. The most important concerns for the patients’ community were the requirement of checks and balances to ensure protection beyond the software and the existing potential for misuse of information by authorized users (e.g. negligence, insufficient training).

    From our General User study, we learned that the disclosure of PII can be limited to a great extent by using a clickable interface that lets users open up information on an as needed basis. The information risk opening identifying information was quantified and it was found that the majority of the participants used only a fraction of the budget provided to them and achieved an accuracy equivalent to that of a fully open interface. However, we also learned that if we enforce a very tight limit on the budget, the accuracy starts to suffer.

    Additionally, we held our User Advisory Committee meeting to obtain feedback regarding our NGT sessions results and opinions and suggestions about our future work.

    Videos you can watch

    We also presented our poster on NGT sessions findings at the AcedemyHealth 2018 Annual Research Meeting (ARM) in Seattle, WA. The poster has also been accepted and we will soon present the findings to audiences at the Advancing Ethical Research (AER) conference in November 2018.

    Our award winning research team at the SIGCHI conference, Montreal, Apr. 2018

    Image of our team at SIGCHI

    Left to right: Dr. Khairi Reda, the moderator and Dr. Hye-Chung Kum and Dr. Eric Ragan with the honorable mention award from SIGCHI

    From September 1, 2017 to February 28, 2017

    Findings from our first study demonstrate that with appropriate interface design, (1) even with access to only 30% of the information, people can make comparable linkage decisions as those with access all information, and (2) even fully masked data can be used to make correct linkage decisions in many situations. This and other findings can be found in our paper on information privacy and record linkage, which won the CHI2018 Honourable Mention Award (top 5% of all submissions). Building on this initial interface design and tutorials from the first study period, we have started the stable code base for the prototype software on git. We ported the user study 1.3, into an effective tutorial using this code base for learning the PPIRL framework to be disseminated broadly. You can try it here.

    Using this tutorial, the main focus during this period was to facilitate the Nominal Group Technique (NGT) sessions with IRB experts to obtain feedback about using the PPIRL framework for research studies that include record linkage. We engaged the IRB experts to obtain feedback on the potential risk and benefits of the framework to inform the template IRB application to accompany our software. We also presented our framework through a webinar to the Patient-Powered Research Networks (PPRNs) on February 20, 2018. The team is now able to move on to recruiting and facilitating additional NGT sessions with patients to discuss (1) what they like, (2) what they are concerned about, and (3) what additional questions they have on the PPIRL framework in March.

    We also had our first User committee meeting to obtain opinions and feedback about the risks and benefits related to the framework. You can see notes here.

    Our research team at the AER Conference, Nov. 2017

    Mountain View

    From March 1, 2017 to August 31, 2017

    The main focus during this period was to implement 5 different interface designs and test its effectiveness through user studies to determine the impact of hiding information during linkage decisions. There are three different versions of the user study where participants can get hands on experience. User study 1.1 provides an in depth experience of any given mode. User study 1.2 provides a comprehensive experience of all 5 modes (1 hour). Finally, user study 1.3 provides a shortened version of user study 1.2 (20 minutes). In this period, user study 1.1 and 1.2, involving participants and an expert committee, were completed and we are currently completing user study 1.3. Now the links are available on our project website for anyone to experience the tradeoff between information disclosure and linkage decisions.
    Click here to try.

    We also had out first Methods committee meeting to seek feedback for the different design options. See notes here.