The Book of Statistical Proofs

Autor

Name & Kontakt: Joram Soch, joram.soch@bccn-berlin.de
Institution: Bernstein Center for Computational Neuroscience & Charité-Universitätsmedizin, Berlin

Links

Autor: Webpage, ORCID, GitHub, Twitter
Projekt: Webseite, Repository, Twitter

Projektmanagement

ToDo-Listen

~~ToDo-Liste bis Ende 2019 (Dropbox)~~
ToDo-Liste ab Anfang 2020 (GoogleDocs)

Projektberichte

~~Vorbericht (zur Auftaktveranstaltung 2019/2020)~~
Zwischenbericht (erschien am 17.01.2020)
Abschlussbericht (erschien am 28.05.2020)
~~Blog-Beitrag (Entwurf vom 15.02.2020)~~
Blog-Beitrag (Version vom 10.04.2020)
Liste der Blogbeiträge im Wikimedia-Blog

Projektbeschreibung

This application is in English, but you can contact me in German.

Zusammenfassung

Thanks to the increased application of advanced statistics (such as Bayesian inference and machine learning) and the increased availability of computing resources (such as computing clusters or high-performance GPUs), the recent past has seen many disciplines developing a “computational” branch – computational neuroscience, computational chemistry or computational sociology, to name just a few of them. In parallel, while there is a growing amount of data being collected (“big data”), there are also growing concerns about the reproducibility of data analyses and replicability of empirical studies (“replication crisis”). In this situation, the empirical sciences need sound methodology.

Sound methods developments in turn almost universally rely on statistical theorems to theoretically justify new statistical techniques proposed to analyze empirical data. However, while most of this statistical theory is easy to retrieve, proofs for those theorems are often difficult to obtain, because they are either contained in expensive books or locked behind publisher paywalls or distributed across multiple sources. This hinders the understanding of statistical theory and the development of cutting-edge techniques.

The core objective of the proposed project is to close this gap by collecting important statistical proofs into a centralized, open and collaboratively edited archive, The Book of Statistical Proofs. The primary goals within the proposed project are to develop a taxonomy of statistical proofs, to select an implementation for the proof archive, to collect as many proofs as possible from various sources and to engage other computational researchers to contribute to the project. A GitHub repository with a preliminary table of contents and exemplary statistical proofs is available to illustrate the project idea. Once the envisaged proof archive is established, the mathematical knowledge underlying the inference machine of most of the empirical sciences will become transparent and comprehensible to computational researchers around the world for free.

Zielsetzung

Within this project, I want to develop and establish a Wiki-like archive that collects important proofs from statistics and probability theory, to be used by all kinds of computational researchers when developing new methods for data analysis. To this end, I will (i) work out a taxonomy of statistical proofs, (ii) choose an implementation for the proof archive, (iii) collect statistical proofs from distributed sources and (iv) communicate the proof archive to the interested community.

Multiplikation

As an expert on Bayesian model selection (BMS) for general linear models (GLM), I want to contribute my statistical knowledge to the envisaged archive. Beside this, I was already able to win over a colleague of mine to contribute proofs related to the multivariate general linear model (MGLM). The Bernstein Center for Computational Neuroscience in Berlin is a fairly large community of mathematically literate scientists. Once established, I plan to advertise the archive to colleagues at my institution and reference it on posters presented at conferences as well as in papers submitted for peer-review, such that the project attains its open science value and becomes a truly collaborative enterprise.

Nachnutzung

As the material collected within this project is supposed to be assembled in a stable repository (see below), it will be available to everyone, whether they are computational researchers or mathematically interested laypeople, online for free. Additionally, given that the project has become collaborative by the end of the funding period, it will be supported by the community that is editing and curating the archive (including myself).

Meilensteine

Based on the project goals outlined above, the project consists of four distinct work packages (WP) the completion of which constitute individual milestones (MS):

WP 1: to finalize the table of contents for The Book of Statistical Proofs; for which a preliminary draft already exists; to be completed by 09/2019.
WP 2: to decide over the actual implementation of the proof archive; candidates being a Wikibooks entry (such as this), a GitHub repository of LaTeX files (such as this) or GitHub pages with Jekyll integration (see here); to be completed by 09/2019.
WP 3: to collect proofs from distributed sources; such as text books, journal articles and Wikipedia articles; to be worked on until 06/2020.
WP 4: to communicate the proof archive to the community; e.g. via a review article, blog posts and Twitter communication; to be completed by 03/2020.

Mittelverwendung

The proposed project is collaborative by nature. To facilitate and incentivize the unpaid work going into this open science resource, for the first 200 proofs submitted to The Book of Statistical Proofs, each submitter will receive 25 € per proof. As the principal investigator of this project, I will administrate and transfer the money in a transparent way.

Beitrag zu den Wikimedia-Projekten

Depending on what kind of implementation is chosen for the proof archive (see above), the collected resources can be summarized into an open-content text book (Wikibooks) or used for open-content community learning (Wikiversity). Depending on its success, The Book of Statistical Proofs can also become a standard resource for probability theory, being referenced in corresponding encyclopedia articles (Wikipedia).

Beitrag zu Offener Wissenschaft

The proposed project contributes to open science in three ways: First, important information that is, in part, the foundation of today’s statistical practice in the empirical sciences will not be contained in expensive books or locked behind publisher paywalls or distributed across multiple sources anymore, but freely and openly available to everyone (open access). Second, these materials may be used for teaching the basics of probability theory in schools or at the university to increase statistical literacy among empirical scientists (open teaching). Third, computational researchers may base their methods developments on the collected knowledge and resources (open methods).