Wikiversity:Fellow-Programm Freies Wissen/Einreichungen/Exploring the landscape of open source software
Exploring the landscape of open source software[Bearbeiten]Projektbeschreibung[Bearbeiten]A part of open science practices, especially in computational research, is that all software components are open source. Only this can ensure full reproducibility without part of a workflow being a 'black box'. Furthermore, open source software ensures accessibility as it doesn't require expensive licenses. The landscape of open source software is vast, continuously changing, and difficult to navigate. This is largely due to an increasing amount of software components being published, e.g. on the hosting platform GitHub. However, a detailed and up-to-date overview of the available software for a given scientific field or a specific application is often lacking, or if exists requires manual curation (e.g. Unakafova et al. 2019). A lot of code is produced that is redundant to an existing software base that is not visible enough. Moreover, it is not apparent which tools, packages, and libraries are actually used by the community. This scenario is unfavorable for two reasons. First, the publication and contribution to a useful software package should be similarly acknowledged as the contribution to a research paper. It is imperative that software is cited, for example, to acquire funding for the maintenance of that software. The second reason is that coding time is expensive and it is worthwhile to evaluate where to allocate it best. Thus, the first step of creating any software component should be to assess what already exists, can be re-used, adapted, or extended. Nevertheless, there are metrics that can be used to derive such information. Code repositories on GitHub offer various ways of credit assignment and thus metrics to assess the popularity of a software component (Stars, Watchers, Forks, Dependents) and its reusability or usefulness for an application (matching keywords, available documentation, compatible dependencies, code quality indicators (e.g. unit tests), or whether the project is currently maintained). In this project, I would like to use this information to explore, analyze, and visualize the landscape of open source projects. More precisely, I will conceptualize a tool for this purpose and develop a corresponding prototype. This tool will build on the GitHub API and its Python bindings to evaluate the current status of a list of open source projects, their interdependence, and the similarity to other projects on GitHub. In order to develop the exploration tool along a practical use case, I'd like to expose the 'unofficial' base of Python packages in electrophysiology research which may also be further supported by additional metadata sources. Furthermore, I plan to collect the insights from the project in a practical guide on how to make software projects more visible and re-usable. The results and the process will be communicated in a series of posts on a personal blog and talks at my institution and/or community events. Autor/in[Bearbeiten]
|