Open Science Preconditions: Legal issues III: Data privacy

VO Sharing is daring: Open Science approaches to Digital Humanities

Please read the lesson script below and complete the tasks.

Questions, remarks, issues? Participate in the Zoom meeting on Mon, 27.04.2020, 5 p.m. - 6 p.m.!

This week's topic of discussion:

Is data privacy compatible with Open Science? How can we protect our data subjects while still opening up our research?

Mon, 27.04., 16:45 - 18:15: Open Science Preconditions: Legal issues III: Data privacy

Before we can begin this session, there is some very sad news to be shared. Jon Tennant, a visionary and forefront fighter of the Open Science movement, died unexpectedly in an accident. Jon was the funder of the Open Science MOOC (MOOC = "Massive Open Online Course"), a tremendously useful resource for teaching and learning Open Science. As we have not mentioned Jon in this course so far, we will use this very sad occasion to get acquainted with his work and his vision.

This is Jons obituary, which was published by his colleagues and friends in the Open Science MOOC project:

In memoriam of Jon Tennant

We are deeply saddened by the sudden death of our colleague Dr Jonathan (Jon) Tennant.

Jon was a visionary, deeply committed to making science accessible to everyone. For years, he worked tirelessly to make the world understand the urgency of the issues, writing prolifically and sharing his vision with countless people.

In his vision of science accessible to all, Jon founded the Open Science MOOC, a platform dedicated to educating everyone interested in the different aspects of Open Science and its implications for knowledge creation and dissemination.

Thanks to Jon, the Open Science MOOC and the community he built around it are today a strong pillar in the movement towards fair, transparent, and collaborative research. We are grateful to have worked with him over the past years.

Jon will be sorely missed. Our heartfelt condolences to his family and loved ones.

A crowdfunding campaign to help Jon’s family bringing him back home to Leicester, UK, has been initiated here: https://www.gofundme.com/f/repatriation-burial-costs-fund-for-jon-tennant.

Julien Colomb, Ivo Grigorov, Ricardo Hartley, Jo Havemann, Lisa Hehnke, Bianca Kramer, Christopher Madan, Paola Masuzzo, Tobias Steiner, Erzsébet Tóth-Czifra, Rutger Vos, Danny Colin (Open Science MOOC: In memoriam of Jon Tennant)

We can learn from this text that Jon's ideal was to make research more fair and transparent, but also to encourage researchers to leave the ivory tower and to work together. His goal was to make knowledge more accessible to anyone who is willing to know it. To support researchers in striving towards the same goals, he invented the Open Science MOOC.

Task 1

We will get to know the Open Science MOOC in more detail in the next weeks, as we will work through some of the modules offered there to learn about Open Access to research papers, data discovery, Open Evaluation and more. This week, you are invited to make a first visit to the MOOC (Massive Open Online Course) to get to know its structure. To understand the vision Jon was trying to implement with this platform, watch his Keynote from the DARIAH annual event 2018: "Open Science is just good science".

Task 2

In the past weeks, you registered on ORCiD and learned how to use it. Can you find Jon's ORCiD profile? What can you learn about his publication activities from his profile? Take a close look at the profile, notice all the different types of information Jon provided about himself and his publications. This is a true best practice example of transparent, Open communication in the scholarly sector.

Let's move on to the main topic of this session, which is Data privacy. As you are probably aware, the European Union implemented a regulation that governs our rights and duties with respect to personal data in 2018, the General Data Protection regulation (GDPR). The GDPR's main aim is to ensure that individuals are protected against abuse of information about them by large international internet companies, but it affects pretty much everyone in their daily work practice if they are working with "personal data" in one way or another. The GDPR states that

"The protection of natural persons in relation to the processing of personal data is a fundamental right." (GDPR, Art 1)

This first sentence of the GDPR includes the three core concepts that determine the scope of the GDPR and that we have to understand in order to dive deeper into the rights and restrictions that the GDPR defines: "natural persons" (or "data subjects"), "processing", and "personal data". Time for definitions.

“Data subjects”: Although it might seem obvious to many, not everyone is aware of this basic, but crucial fact: The GDPR only applies to natural persons. This means that the GDPR does not apply to legal bodies (i.e. institutions), but only to real people. It also means that it only applies to living people, not historical subjects. As humanities researchers work with data about dead people more often than with data about living people, this might be a relief for some. However, while the GDPR does not protect data about dead people, other (national) laws might still restrict the collection and publication of data about dead people (this is especially true for relatively recently deceased persons). The GDPR defines the rights that data subjects have with regard to their personal data. These include the right to information (e.g. about the data themselves, their processing and its purpose, their storage and its duration, their accessibility and their protection), the right to access the data (regardless if the data subject was the provider of the data or it they were taken from elsewhere) and to rectify them if necessary, to restrict their processing or object to it, and, most importantly, the right to erasure. These rights of the data subject can be restricted to a certain extent by the research exceptions defined by the GDPR.

“Processing”: According to the GDPR, “data processing” is

“any operation or set of operations which is performed on personal data” (GDPR, Art 4).

This means that any action you do with data is part of the processing, even collecting, storing, and deleting data. This is also true if your data are only in analogue form.

“Personal data”: According to the GDPR, “personal data” are

“any information relating to an identified or identifiable natural person” (GDPR, Art 4).

This means that any information about a person that might enable someone to identify that person qualifies. Examples include name, date of birth, age, sex, gender, address, pictures of the person, audio recordings of the person’s voice, etc. Be aware that a combination of data might make a person identifiable even if the person’s name is not included in the data collection; e.g. the information that one of your data subjects is male, 80 years old, and lives in a village with 300 inhabitants, will likely make the data subject identifiable. Also be aware that the form or format of the data is irrelevant. All personal data are protected by the GDPR, even if you only have them in handwriting on a slip of paper in your desk drawer.

Pseudonymized data (i.e. data that can not be directly associated with an identifiable individual in their current state, but that could still be transformed into a state that allows to reconstruct who the data describe) are considered personal data by the GDPR. Anonymized data (i.e. data that can not be associated with an identifiable individual in their current state, which must be irreversible) are not personal data according to the GDPR.

Task 3

The GDPR is a very technical regulation. In order to get a good overview of all the relevant concepts and "vocabulary", read the summary "What is GDPR, the EU’s new data protection law?" (provided by the platform gdpr.eu). As a next step, please read the slides "What data protection is all about (and what has changed with the GDPR)" by Walter Scholger. If you speak German, you may watch Walter's video "European Data Privacy Legislation" instead.

Now that we have understood all the basic principles and how the GDPR works in general, let's look at how this affects us specifically in the field of research. Firstly, as researchers, the GDPR affects us, respectively the institution employing us, in the same way that it affects any company or public body that is collecting / processing personal data for its general tasks. For example it affects us when we want to set up mailing lists, when we want to manage employee or student records, when we want to take photos of people who attend events that we organize, and so on. But in addition to this, it can also affect us in doing our "actual" work, which is: researching. Not only in the hard sciences or in medicine, but also in the humanities we might have to collect personal data for answering research questions or for archiving research materials. Let me give you a few example scenarios:

We could be linguists researching how small children acquire their first language skills. To that end, we might take videos of children learning to talk which we then analyze and process. These videos (as well as the metadata telling us something about what/whom we see in the video) qualify as personal data.
We could be historians researching the holocaust. We might digitize historical documents with lists of names of people who were murdered in concentration camps, or lists of names of the murderers. Some of the people named might still be alive. Thus, these lists qualify as personal data.
We could be sociologists documenting how society structures change and evolve in war zones. To that end, we might travel to Syria and document personal interactions by photographing people in the streets. These photos constitute personal data.

Task 4

Think of your own research interests. Can you come up with a (hypothetical) scenario in which your interests might make it necessary for you to collect and process personal data?

Luckily, not only researchers, but also legislators understand that research might make it necessary (and, even more importantly, justified) to process and archive personal data in ways that would not be ok for other purposes. Therefore, the GDPR defines "research exceptions" that widen the range of acceptable actions to be carried out on personal data for the purpose of research.

Task 5

Take a look at the paper "Language resources and research under the General Data Protection Regulation" by Paweł Kamocki (whom we have already met in one of the previous sessions) and his colleagues. Read section II, "Special rules concerning research under the GDPR". Think about your hypothetical research scenario from task 4 - are you allowed to follow your hypothetical plan thanks to the research exception, or do you need to take further data protection measures?

You will have noticed that one way to work with personal data despite the restrictions imposed by the GDPR is to collect the data subjects' consent (a consent that you yourself are very likely to be giving multiple times a day by clicking on "accept cookies", "I agree" and such). However, the collection of consent, or rather the documentation of the fact that consent was collected, is somewhat tricky, especially for researchers. In task 5, you read a paper that is provided by the research infrastructure CLARIN-ERIC, but also the second European research infrastructure that we have encountered in our past sessions has identified the GDPR as an important topic for its community. DARIAH-EU has implemented a working group that deals with "Ethics and Legality in Digital Arts and Humanities" (ELDAH). ELDAH focuses on all areas of legality that are relevant to humanities researchers, i.e. in addition to data privacy also copyright and licensing. But the current efforts of the working group focus on the issue of obtaining consent from dat subjects for research purposes - which is why ELDAH is developing the "Consent Form Wizard", a tool that will provide GDPR compliant consent forms for researchers who need to collect their data subjects' consent.

In addition to CLARIN and DARIAH, there is another big European project which dedicates itself to Open Science exclusively: OpenAIRE.

On the one hand OpenAIRE is an network of dedicated Open Science experts promoting and providing training on Open Science. On the other hand OpenAIRE is a technical infrastructure harvesting research output from connected data providers. OpenAIRE aims to establish an open and sustainable scholarly communication infrastructure responsible for the overall management, analysis, manipulation, provision, monitoring and cross-linking of all research outcomes.

In one of our previous sessions, we have already worked with one of the most important resources now maintained by OpenAIRE: Zenodo. But in addition to technical infrastructure such as this repository, OpenAIRE also supports us in developing skills we need for Open Science - such as the knowledge of data privacy. Therefore, OpenAIRE provides two "Legal Policy Webinars" this and next week (29 April, 2 p.m. and 4 May, 2 p.m. - duration: 90 minutes each). The webinars will explain what you need to know about GDPR, you are therefore strongly encouraged to participate. Attendance is open to all and free of charge. Register by following this link.

Do you understand why data privacy is even an issue with regards to Open Science? Do you still have questions, remarks, issues about data privacy? Participate in the Zoom meeting on Mon, 27.04.2020, 5 p.m. - 6 p.m.!

This week's topic of discussion:

Is data privacy compatible with Open Science? How can we protect our data subjects while still opening up our research?

By finishing this session, we have finally worked though all "Open Science preconditions". Starting next week, we will be putting Open Science into practice!

Reading & resources

Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection regulation GDPR) .
Paweł Kamocki, Erik Ketzan, and Julia Wildgans. 2018. Language resources and research under the General Data Protection Regulation. CLARIN Legal Issues Committee CLIC White Paper Series, CLIC White Paper #3.