8 Workflow and Reproducibility
8.1 Learning Objectives
By the end of this chapter, you should be able to:
- Organize your work with R projects
- Use Quarto for reproducible documents
- Follow best practices for naming files and structuring directories
- Incorporate code, text, and output into a single reproducible report
- Use version control with GitHub (optional, for advanced students)
8.2 Why Workflow Matters
Reproducible workflows:
- Make it easy to rerun analyses later
- Allow others to reproduce your results
- Keep projects organized and easy to navigate
- Prevent errors caused by hard-coded file paths and messy code
8.3 Organizing Projects in RStudio
8.3.1 RStudio Projects
- Use File → New Project for each analysis/course project
- Keep data, scripts, and outputs in subfolders (e.g.,
data/,scripts/,figures/,docs/) - Avoid using absolute paths—use relative paths inside the project
8.3.2 Example Project Structure
my_project/
data/
raw_data.csv
scripts/
analysis.R
figures/
plot1.png
docs/
report.qmd
my_project.Rproj
8.3.3 In-Class Exercise 1 – Project Setup
- Create a new RStudio Project for this course.
- Make folders:
data,scripts,outputs.
- Save your
.qmdhomework file in the project root.
- Render your Quarto document and confirm outputs stay organized.
8.4 Quarto for Reproducibility
Quarto allows you to:
- Combine text and code in one document
- Render reports to PDF, HTML, or Word
- Ensure results match the code that generated them
8.4.1 Example Quarto Workflow
---
title: "My Analysis"
format: pdf
---8.4.2 In-Class Exercise 2 – Quarto Report
- Create a
.qmdfile that loads a dataset and runs a simple analysis.
- Add at least one plot and one table.
- Render to PDF and check the output.
8.5 Best Practices for Reproducibility
- Use scripts and Quarto documents instead of manual steps
- Keep raw data unchanged; clean data with scripts
- Document everything: use comments and text
- Save figures and tables programmatically, not manually
- Render final reports from source code
8.6 Optional: Version Control with Git and GitHub
For students interested in collaboration and tracking changes:
- Install Git and create a GitHub account
- Use
usethis::use_git()to initialize Git in a project - Commit changes regularly and push to GitHub
(We will not cover Git in detail, but this is recommended for your own practice.)
8.6.1 In-Class Challenge – Reproducible Mini-Report
- Set up a project with an organized folder structure
- Create a Quarto document that:
- Reads a dataset
- Runs a simple transformation
- Creates a plot
- Summarizes the results in text
- Render to PDF and check for a clean, reproducible output
8.7 Homework Preview
For homework, you will:
- Organize your project folder (data, scripts, outputs)
- Create a Quarto report with:
- One dataset
- At least one data cleaning step
- One visualization
- One table of summary statistics
- Ensure all file paths are relative (not absolute)
- Render to PDF and submit on Canvas
8.8 Next Steps
Next week, we will move into Data Import (CSV, Excel, and parsing dates) and continue to build your data wrangling workflow.