Course description

Principles, Statistical and Computational Tools for Reproducible Data Science

Today the principles and techniques of reproducible research are more important than ever, across diverse disciplines from astrophysics to political science. No one wants to do research that can’t be reproduced. Thus, this course is really for anyone who is doing any data intensive research. While many of us come from a biomedical background, this course is for a broad audience of data scientists.

To meet the needs of the scientific community, this course will examine the fundamentals of methods and tools for reproducible research. Led by experienced faculty from the Harvard T.H. Chan School of Public Health, you will participate in six modules that will include several case studies that illustrate the significant impact of reproducible research methods on scientific discovery.

This course will appeal to students and professionals in biostatistics, computational biology, bioinformatics, and data science. The course content will blend video lectures, case studies, peer-to-peer engagements and use of computational tools and platforms (such as R/RStudio, and Git/Github), culminating in a final presentation of a final reproducible research project.

We’ll cover Fundamentals of Reproducible Science; Case Studies; Data Provenance; Statistical Methods for Reproducible Science; Computational Tools for Reproducible Science; and Reproducible Reporting Science. These concepts are intended to translate to fields throughout the data sciences: physical and life sciences, applied mathematics and statistics, and computing.

Upcoming start dates

1 start date available

Start anytime

Self-paced Online
Online
English

/ person

Who should attend?

Prerequisites

Basic knowledge of Rand Git
A computer that is capable of downloading software to run on it.

Training content

Introduction to Reproducible Science

Fundamentals of Reproducible Science

Definitions and Concepts
Factors affecting reproducibility

Case Studies in Reproducible Research

Data Provenance

Project Design
Journal Requirements
Repositories
Privacy and Security

Computational Tools for Reproducible Science

R and Rstudio
Python, Git, and GitHub
Creating a repository
Data sources
Dynamic report generation
Workflows

A optional deeper dive into Statistical Methods for Reproducible Science

Prediction Models
Coefficient of determination
Brier score
Area Under the Curve (AUC)
Concordance in survival analysis
Cross-validation
Bootstrap
Simulations
Clustering

Course delivery details

This course is offered through Harvard University, a partner institute of EdX.

3-8 hours per week

Costs

Verified Track -$99
Audit Track - Free

Certification / Credits

What you'll learn

Understand a series of concepts, thought patterns, analysis paradigms, and computational and statistical tools, that together support data science and reproducible research.
Fundamentals of reproducible science using case studies that illustrate various practices
Key elements for ensuring data provenance and reproducible experimental design
Statistical methods for reproducible data analysis
Computational tools for reproducible data analysis and version control (Git/GitHub, Emacs/RStudio/Spyder), reproducible data (Data repositories/Dataverse) and reproducible dynamic report generation (Rmarkdown/R Notebook/Jupyter/Pandoc), and workflows.
How to develop new methods and tools for reproducible research and reporting
How to write your own reproducible paper.

Contact this provider

Contact course provider

Fill out your details to find out more about Principles, Statistical and Computational Tools for Reproducible Data Science.

Contact the provider

Get more information

Country *

Please recommend similar options

I accept the: Terms and Conditions & Privacy Policy

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

edX

141 Portland Street

02139 Cambridge Massachusetts

617-440-9808

edx.business

Training homepage

edX

edX For Business helps leading companies upskill their labor forces by making the world’s greatest educational resources available to learners across a wide variety of in-demand fields. edX For Business delivers high-quality corporate eLearning to train and engage your employees...

Ads

Principles, Statistical and Computational Tools for Reproducible Data Science

Course description