User Study on Information Privacy: Debriefing
Thank you for your participation in our study.
We know it was a lot of data, decisions, and clicks.
But then again, in the digital age, that seems to be the trend.
- Some data is generated somehow (e.g., customer purchasing data on Amazon),
- The data is turned into some useful information that is given to a person (e.g. recommendations of products based on what people like you bought)
- Then the person makes some decision, in theory, based on the data given (e.g., that looks like the right thing to buy)
- Which is often followed by communicating the human decision to a computer (e.g. clicks to purchase the product)
So, better understanding of the following is important:
- How data should to be turned into useful information, as well as how it should be presented/visualized
- How people make decisions based on information
- How best to communicate the human decisions into computer systems
The Population Informatics Lab is primarily interested in investigating these questions in the context of using person level data for social good.
Person level data has been used for marketing, campaigning, and intelligence with little transparency or control for many years. Yet, current privacy protection strategies often prevent researchers from using similar information for social good.
Population informatics is the burgeoning field at the intersection of social, behavioural, economic, and health (SBEH) sciences, computer science, and statistics that applies quantitative methods and computational tools to answer questions about human populations (Kum et al. 2014). Population informatics uses social genome data (i.e., big data about people) responsibly to extract crucial insights into society’s most challenging problems. Such insights help us understand the root causes of social and public health problems, predict the downstream effects of different policies, identify upstream opportunities for interventions, and allocate our collective resources for the greatest impact.
Not surprisingly, record linkage - link records from different databases - is one of the core requirements for doing good research in population informatics. Furthermore, this means we also have to grapple with information privacy of the subjects of the data, which is a complex topic and difficult homework for all of us living in the digital era.
You can no longer "avoid" joining the digital society (e.g., joining LinkedIn), yet it is unclear what the impact of your joining the digital community will be (e.g., all of a sudden it seems that everyone/anyone can find you and you cannot hide).
Some things you might want to know about information privacy:
- It has been proven mathematically, that each release of data leads to some privacy loss while providing some utility in terms of data analysis
- It is important to understand that information privacy is a budget constrained problem and thus, privacy and use of the data MUST be balanced
- The goal is to achieve the maximum utility under a fixed privacy budget
- You can not assume unlimited privacy in the digital world
- Thus, societies must consider and build consensus on what is acceptable use of personal level data for research
- There are two very opposite approaches to information privacy
- Limiting access to data (=hiding data), which seems more intuitive, but has high cost on quality of data, use of data (all the time and effort to maintain and manage who has access to what when and where).
- Information accountability (=transparency): hold people accountable for decisions made from data. This is how financial data is protected. For example, your credit score is not a secret, but any decisions made from using the credit score has to be transparent, and people have rights to know exactly how the score was calculated and correct any errors in the data.
- Ultimately, there is no silver bullet to privacy preserving computation. One has to design affordable effective systems for a given problem by combining different tools as needed taking a holistic approach
The user study you participated on specifically was designed to better understand what data we can hide, but still make good record linkage decisions, which ultimately will lead to high quality valid data for good research. We hope that some of you who were given less data had a chance to experience personally what could be the cost, such as quality of results, increase in time and effort, of hiding needed information for privacy protection.
We have yet to see what the results are, but in essence, different people were given different "interfaces/information" to make the record linkage decisions.
- Some were given full data with no other help (icons).
- Others were given literally a legally de-identified data where no names, IDs, birth dates could be seen. They had only the icons to rely on making decisions. You might be surprised to learn that even with such "little" information, most people got more than half of the linkages correct. So there was something useful in the icons.
- Others got something in between.
The results of this study will help our society better find the acceptable balance between privacy protection and good use of person level data for social good.
If you are curious and would like to experience how other interfaces work or want to share your experience with others, please check our
project website in September. We plan to make this user study publicly available as an exercise to experience and appreciate the complexities of information privacy.
Thank you again for your participation.