Reproducible Data Analysis: an Essential Capability in Modern Science

MC-RDA 2024-2025
Engels

The scientific method is historically linked to the possibility that other researchers can replicate and verify its results. As scientific analysis becomes more complex and interdisciplinary, ensuring reproducibility becomes more challenging specially in fields that combine different expertise. To promote transparency, consistency and robustness in science, journals, funders, and institutions are encouraging the use of tools and practices that enhance reproducibility. Lifelong learning helps professionals to keep up with the fast-paced scientific developments and to foster creativity and innovation. By learning about version control, containers, pipelines and data reproducibility, scientists of all levels can improve the reproducibility of their research, as well as the impact and reliability of their findings and methods. Moreover, using the methods introduced here they can collaborate and experiment in ways that allow reproducibility and creativity to coexist and thrive.

 

Content

This course offers participants a comprehensive understanding of economic geography and urban networks, enabling them to identify and discuss fundamental principles. Participants will develop the capacity to comprehend the decision-making processes guiding businesses in selecting locations and address inquiries related to urban geography, showcasing their ability to apply analytical tools to scrutinize interplay between cities and economies. By delving into the intricate relationship between urban dynamics and economic activities, participants will gain expertise in analyzing urban economic development from various angles. Participants' comprehension of the operations of creative industries and their contributions to the urban economy will be enhanced, not only benefitting their individual endeavors but also positively impacting their respective organizations and society as a whole. Participants will be prepared to offer insights and strategies fostering sustainable urban economic growth, efficient city planning, and innovative business decisions. The knowledge gained from this course will play a pivotal role in shaping urban landscapes, fostering economic resilience, and driving societal progress.

 

Detailed content

Session 1

- Successful examples of the use of these tools for reproducibility

- Aspects of reproducibility of data

Session 2

- Introduction to Git and GitHub concepts

- Routine usage of Git

- Inspect and compare different versions of a git project

- Connecting and integrating to GitHub

- Collaborate and experiment with Git and GitHub

Session 3

- How to access VSC facilities and use the HPC scheduling

Session 4

- Introduction to containers basic concepts and Docker syntax

- Find, obtain, and run a Docker image

- Adapt and build Docker recipes

- Find and run Apptainer images

- Adapt and build Apptainer images

- Use Apptainer in the VSC with the scheduling system

Session 5

- Introduction to NextFlow concepts and syntax

- Execute NextFlow pipelines with different executors and environments

- Write and run a NextFlow pipeline

- Write and modify modules and config files as best practice for pipeline development

- Use NextFlow in the VSC with the scheduling system

Session 6

- Projects:

* 2 small projects

o Git & GitHub project consists of creating collaboratively your documentation, with version control. The project is started during the lesson and finished asynchronously before delivery. (Estimated asynchronous time 3h)

o Docker and Apptainer project consist of adapting, writing, and building one Docker image based on a Docker recipe. The project must be delivered in GitHub, with history of versions available. The project is to be collaboratively developed after the lesson. (estimated asynchronous time 6h)

* 1 medium project

o NextFlow project consists of using docker or Apptainer images to create and run NextFLow pipeline that use config files and modules. The project must be delivered in GitHub with the history of versions available.

o Complementarily, an oral presentation (defence) of the final project must include a summary of the topics learned and examples that can demonstrate the use of the tools and focusing on reproducibility.

o The project is to be developed collaboratively after the lesson (estimated asynchronous time 8h)

 

Course prerequisites

* Being able to use simple shell commands (Linux for example), you can use this e-learning material to prepare. 

* Experience with scripting is preferred (point to resource of catch-up before the course)

* Creating a VSC account

 

Final competences

  1. To use Git and GitHub for version control, reproducible and easy to share code, text documents, and other appropriate data for this context;
  2. To use Git and GitHub for individual projects and for collaboration;
  3. To use, adapt and write containers locally and in a super computer, understanding the different systems and its particularities;
  4. To use and to differentiate Docker containers and Apptainer images and their particular usage.
  5. To run, adapt and write NextFlow pipelines locally and in a super computer, 
  6. To understand and to make use of best practices for NetFlow pipelines, such as config-files and modules;

 

Exam

Project evaluation and a Oral evaluation.

 

Type of course

This is an on campus course.

 

Course material

Syllabus, overheads, exercises handout

 

Location

Technology park, 75 – CMB building (FSVM II), 9052 Ghent

Day 1: L5 room, the 5th floor

All the other L4 room, the 4th floor

Register here

Reproducible Data Analysis: an Essential Capability in Modern Science

Beschrijving

Data

Februari 28,  from 9.30 am to 5 pm

March 10 & 11, from 9.30 am to 5 pm

March 28, from 9.30 am to 1.30 pm

March 31, April 1, 28, 29 from 9.30 am to 5 pm

May 12, from 10 am to 5 pm

 

Price

  • €650

 

More info & Subsciption

UGent website: Universiteit Gent: Micro-credential Reproducible data analysis

Inschrijven via tabblad: Off to a good start

Contactpersoon: science-academy@ugent.be