Using Open Web Data to Assess and Improve Non-binary Gender Operationalization at Scale - Interim Report

Aus Wikiversity

Using Open Web Data to Assess and Improve Non-binary Gender Operationalization at Scale - Interim Report[Bearbeiten]

I. Information about the research project (Infos zum eigenen Forschungsvorhaben, max. 3000 Zeichen)[Bearbeiten]

A. Status Quo (Status Quo)[Bearbeiten]

The goal of the project is to conduct and publish a study on using inclusive gender operationalization while inferring gender in the web-based sources. The proposed methodology is an attempt to provide an alternative to often used binary automatic gender detection methods (for example name-based unsupervised approaches implemented in Python (Gender Guesser) and R (Genderize)). While the widespread automatic gender detection methods are often criticized for accuracy and bias, the binary operationalization inherited in the design is often overlooked. As we know that gender is neither binary, nor physiological, we should aim for inclusive measures in data collection affords.

The study focuses on a case from the film industry and uses a sample of directors of 1727 films selected from a sample of six relevant international film festivals. Instead of relying on names or pictures of individuals to measure gender, we analyze text data on directors’ self-representation on the web (e.g., Wikipedia pages, personal websites, interviews, and other online resources). The analysis focuses on the use of personal pronouns and other direct cues (e.g., a person explicitly mentions a gender-non-conforming identity). Results are then compared to the manual and automatic name-based assignment of gender. The study is planned to be published with open access, including the code.

B. Progress (Fortschritt)[Bearbeiten]

Data collection and data analysis are completed. Text data on directors’ biographies/self-representations were identified and collected. The final dataset included 1.435 film directors of 1.288 films. The data were manually analyzed. The analysis focused on the use of personal pronouns (he, she, they (used in the singular form), they+she, zhe) and direct cues (e.g., non-binary transgender, genderqueer transgender, non-binary, etc.). The self-representation method identified 1.4 % of individuals who do not identify in binary categories. In addition, results were compared with the analysis based on binary gender detection methods implemented in Gender Guesser (Python) and Genderize (R).

C. Next steps (Ausblick)[Bearbeiten]

The next steps include finishing a paper draft, including publishing the code. While it is not clear if a paper can be published by the end of the Fellowship, the goal is to prepare a draft for submission as well as have an opportunity to exchange and receive feedback from fellows and mentors. During a kick-off workshop, some mentors agreed to give feedback on specific aspects of the paper. No additional help from mentors or Wikimedia team is needed.

II. Working with mentors(s) (Zusammenarbeit mit Fellows sowie Mentorinnen und Mentoren, max. 3000 Zeichen)[Bearbeiten]

A. Working with your mentor (Zusammenarbeit mit deiner Mentorin/deinem Mentor)[Bearbeiten]

Working together with the mentor Sandra Hofhues helped to reduce initially outlined goals to realistic milestones. Regular meetings (once a month) conducted via a phone or online conferencing software made it easier to discuss the progress, challenges, and new ideas. The meeting structure is set up so that it allows a one-on-one meeting with the mentor, followed up by a group meeting with another fellow. This way, it is possible to discuss both individual and common questions and challenges.

B. Exchange with other fellows (Austausch mit anderen Fellows)[Bearbeiten]

The exchange with other fellows took place mostly during the kick-off workshop. The structure of the event allowed an intensive exchange of various topics relevant to my project as well as other research activities. The fellowship email list provides valuable information on some running projects and upcoming events. After the draft of the paper is finished, I plan to be more active in reaching out to the fellows for a concrete feedback.

III. Communication and networking (Kommunikation und Vernetzung, max. 3000 Zeichen)[Bearbeiten]

A. Communication with respect to the Fellowship (Kommunikationsaktivitäten mit Bezug zum Fellow-Programm)[Bearbeiten]

So far, the fellowship program was primarily discussed in informal settings with local colleagues, international partners and other international colleagues (e.g., Women and Gender Minorities in Digital Humanities working group). In addition, Prof. Dr. Loist and I are currently writing a blog post about the project, where the fellowship is explicitly mentioned. In addition, in April we plan a workshop at the Film University Babelsberg Konrad Wolf. The workshop will gather local and international film studies and media studies researchers working in the area of film festival research. In addition to focusing on substantive and methodological questions, the workshop will be used to introduce the fellowship program.

B. Knowledge Transfer (Weitergabe von Wissen)[Bearbeiten]

In the first part of the fellowship, knowledge transfer primarily took place in informal meeting concerning publications or project planning (e.g., taking an initiative to publish supplemental materials on Zenodo, including open science as an explicit topic in publications, taking into consideration existing legal framework for data collection and sharing).

The above mentioned workshop  (in iii.A) planned for April 2020 will also be used as a platform to share gained knowledge and experience in the area of open science, including legal aspects of data collection and data sharing, existing open science tools and practices.

C. New contacts in open science (Neue Kontakte Offene Wissenschaft)[Bearbeiten]

In October, I took part in the organized webinar on Wikidata (organized by Jakob Voß). The webinar provided insights into a Wikidata community. In the second part of the project during the writing stage, there will be more opportunities to engage in the open science community.

D. New contacts in Wikipedia-community (Neue Kontakte Wikimedia-Communities)[Bearbeiten]

Nothing beyond activities described in the section C.

IV. Promotion of open science (Förderung von Offener Wissenschaft, max. 4000 Zeichen)[Bearbeiten]

A. New initiatives in promoting open science (Neue Initiativen zur Förderung Offener Wissenschaft)[Bearbeiten]

There are no new initiatives.

B. Initiatives to promote open science (Initiativen zur Förderung Offener Wissenschaften)[Bearbeiten]

As there are no explicit open science initiatives at my local institution, I see initiating and engaging in formal and informal discussions about open science practices as an important step in promoting open science. One potential contribution from my side could be to advocate for open science practices via participating in the planned meetings, the workshop, as well as informal settings. In addition, communication via blog posts and not only scientific publications can encourage a more open and dynamic communication.