Population Informatics Lab

The seatbelts and airbags for the digital highway

Hye-Chung Kum (November 21, 2019)

Over 30,000 people die in road crashes each year and an additional 2 million are injured in the US. Yet, in the modern day most all of us assume the risk of driving daily without being paranoid. We also actively manage the risk individually by obtaining car insurance, wearing seat belts, and not driving under the influence. As a society we pass laws such as requirements for insurance, seat belts, texting bans, and punishments for driving under the influence. We do not sue car manufactures or those who built the roads on a routine basis when accidents occur. But we do hold car manufactures accountable when there are faulty models and when there are major road hazards that are not being handled properly by those who are responsible for road safety. This is called due diligence. There is an expectation of due diligence and responsibility of people and organizations that should play a role in road safety. Further, there is an expectation that car manufacturing companies and government agencies will invest in research and education of safer technologies to reduce the risk of driving.

It was not always like this, where people are mostly on the same page with words and concepts to talk and think about road safety and risk management. In the early days when cars were introduced, most people did not understand the concept of drinking under the influence, or using airbags, or seat belt laws and wondering about the risk of driving compared to using a wagon. It took time and trial error to develop the words and concepts to be safer on the roads. But we know now, it is not about denying the real dangers of driving or being paranoid about it. Risk of car accidents is an inherent nature of driving. We deal with this by managing the risk together, and continuing to figure out how to better manage the risk. It is having expectation of transparency and accountability for all major players such as car manufacturing companies and government agencies to keep the roads safe, as well as continuously educating and requiring personal responsibility of those driving.

In many ways, the digital highway is not too different from the physical highway. In the U.S. in 2017, there were 1,579 data breaches with close to 179 million records exposed. We are just starting out, with the Internet continuously evolving with more and more diverse activities occurring online. We do not yet have the words and concepts to really understand the potential risks and real harms from being on the digital highway -- the Internet -- but here is what we do know. Among the experts, many agree that there is no such thing as a safe database in use. Either it has been breached or it will be. The only question is when.

So what can the general public do? Stop using the digital highway? Just like it is not possible to live without driving in many parts of the world in the modern day, it is also not possible to not exist in the digital world. Even if one were to get rid of their smart phones and personal computers, most all daily activity such as going to school or work, shopping or banking produces a digital trace in the many databases out there. In fact, the day you are born, you start your digital trace on the birth record. We have no option to "clear" our traces from the databases in the digital society.

We also know one concrete harm, identity theft can occur when certain data fall into the wrong hands for which we have started to offer one concrete defense, identity monitoring programs that can at least alert you if something is amiss before it does real damage. There are other harms we understand less about how to handle, such as online bullying or systematic discrimination that may occur in opaque machine learning models that may have serious consequences.

The risk of information privacy becomes even more complicated because there are no borders on the Internet. I can be attacked by people on another continent where no U.S. policy can reach them. The European Union has started to build a comprehensive framework to privacy protection with the implementation of General Data Protection Regulation (GDPR), which went into effect on May 25, 2018. The fundamental protection framework is Privacy by Design. Privacy by Design is a concept that focuses on affordable privacy that goes beyond the narrow view of privacy as anonymity and attempts to meaningfully design privacy principles and data protection into the full system from beginning to end [ Shapiro 2010]. For example, my own research [Kum et. al. 2019, Ragan et. al. 2018, Kum et. al. 2014a] investigates designing a system for enhancing privacy in record linkage, which is connecting various pieces of information on a person from disparate sources. Sometimes called patient matching, record linkage is a critical task to follow different care processes over time to inform policy decisions and clinical care decisions such as counting how many people are readmitted after discharge. It is required to reap the benefits of big data through what I have termed Population Informatics [Kum et. al. 2014b] when conducted safely in well designed secure computing environments [Kum et. al. 2013].

Specifically, the GDPR Article 23 calls for minimum necessary disclosure (i.e., hold and process data only necessary to carry out the duties) and limiting access to personal data to only those that need it to carry out their duties. But the devil is in the details because there is much judgement required in determining what is necessary disclosure and who needs access. Protecting privacy means controlling what data can be accessed and who can access it for what purpose. These protections involve tradeoffs between providing too much access and too little. Too much raises privacy risks and too little can make data systems less useful and costly in time and money to utilize. Maintaining privacy and confidentiality of data while having sufficient information for meaningful use requires a well-orchestrated system that was designed with privacy in mind from the get go.

Consider all the emails and updates you received for the many apps you use around May 25th requiring users to explicitly consent to and understand how their personal data is used. Definitely a step in the right direction of ethical use of personal data and privacy protection, but with it came the burden for all users to learn about how to use the airbags for the digital highway. This maybe a requirement for being on the digital highway. The real question that still remains is to what extent do users have to also understand the mechanisms of the digital highway to use it effectively? Even worse, does anyone really know how this is supposed to work? Information privacy is still a very active area of research that requires more maturity. Unlike the airbags in our cars that we do not have to do anything to keep us safe, and manufacturers who know exactly what they are trying to keep us safe from, the threats of the Internet are more elusive and evolving as we speak making it much more challenging to design effective safety mechanisms. But it is clear, the answer lies in the art of balancing usability and privacy through privacy by design. And for usable and affordable privacy, all stakeholders, including the public, will have to do their due diligence that is enforced through transparency and accountability.


  1. Shapiro S . Inside risks - Privacy by design: Moving from art to practice . Comm. of the ACM . 2010 ;53 : 6 .
  2. Kum, H.-C., Ragan, E., Ilangovan, G., Ramezani, M., Li, Q., and Schmit, C. Enhancing Privacy through an Interactive On-demand Incremental Information Disclosure Interface: Applying Privacy-by-Design to Record Linkage. 2019 the Symposium on Usable Privacy and Security (SOUPS). 23% (=27/119 acceptance rate)
  3. Ragan, E., Kum, H.-C., Ilangovan, G., and Wang, H. (2018). Balancing Privacy and Information Disclosure in Interactive Record Linkage with Visual Masking. Proceedings of the SIGCHI conference on Human factors in computing systems. ACM. CHI2018 Honourable Mention Award (top 5% of all submissions)
  4. Kum, H.C., Krishnamurthy A., Machanavajjhala A., Reiter M., and Ahalt S. Privacy Preserving Interactive Record Linkage (PPIRL). J Am Med Inform. Assoc. 2014;21:212–220. PMCID: PMC3932473 doi:10.1136/amiajno-2013-002165
  5. Kum, H.C., Krishnamurthy A., Machanavajjhala A., and Ahalt S. Social Genome: Putting Big Data to Work for Population Informatics. IEEE Computer Special Outlook Issue. Jan 2014. p. 56-63.
  6. Wikipedida:Population Informatics
  7. Kum, H.C., and Ahalt, S. (2013). Privacy by Design: Understanding Data Access Models for Secondary Data, American Medical Informatics Association (AMIA) joint summits on translation science: clinical research informatics
CITATIONS - The suggested way to cite the above is as follows: Kum, H-C. (2019). The seatbelts and airbags for the digital highway. Retrieved [month day, year], from Population Informatics Lab, URL: https://pinformatics.org/privacy-by-design.php