Appendix C: Appendix: Tidyverse and Tibbles

C.1 Overview

The Tidyverse is a collection of R packages designed for data science.
They share a common design philosophy and work seamlessly together.

Core packages include:
- ggplot2: data visualization
- dplyr: data manipulation
- tidyr: data tidying
- readr: data import
- purrr: functional programming
- tibble: modern data frames
- stringr: string manipulation
- forcats: working with factors

You load them all with:


D 1. What Are Tibbles?

Tibbles are modern replacements for base R data frames.

D.0.1 Key Features:

  • Don’t convert strings to factors automatically
  • Never change variable names
  • Print in a cleaner, more readable way
  • Show only the first 10 rows and as many columns as fit on screen

Example:

library(tibble)

tb <- tibble(
  x = 1:5,
  y = x^2,
  z = c("a", "b", "c", "d", "e")
)

tb
# A tibble: 5 × 3
      x     y z    
  <int> <dbl> <chr>
1     1     1 a    
2     2     4 b    
3     3     9 c    
4     4    16 d    
5     5    25 e    

E 2. Differences from Data Frames

  • Subsetting with $ works the same, but [[ is stricter
  • Tibbles don’t do partial matching
  • Printing is truncated by default (no flooding the console)
tb$y
[1]  1  4  9 16 25
tb[["z"]]
[1] "a" "b" "c" "d" "e"

F 3. Creating Tibbles

You can create tibbles manually with tibble() or convert data frames with as_tibble().

df <- data.frame(a = 1:3, b = letters[1:3])
tb2 <- as_tibble(df)

G 4. Working with Tibbles

Tibbles work seamlessly with all dplyr verbs:

tb3 <- tibble(
  x = 1:6,
  y = c("a", "a", "b", "b", "c", "c")
)

tb3 |>
  dplyr::group_by(y) |>
  dplyr::summarize(mean_x = mean(x))
# A tibble: 3 × 2
  y     mean_x
  <chr>  <dbl>
1 a        1.5
2 b        3.5
3 c        5.5

H 5. Best Practices with Tibbles

  • Always use tibble() for clean, predictable data structures
  • Avoid row names; instead, use an explicit column
  • Use glimpse() for quick inspection
  • Use print(n = Inf) to see all rows when needed

I 6. When to Convert Back to Data Frames

Some base R functions don’t work with tibbles.
Use as.data.frame() if you need to revert:

df_back <- as.data.frame(tb)

I.1 In-Class Exercise

  1. Create a tibble with three columns: name, age, and score.
  2. Use mutate() to add a new column grade based on score.
  3. Group by grade and calculate the average age.

J Conclusion

Tibbles are at the heart of the Tidyverse workflow, offering: - Clean printing - Safer subsetting - Compatibility with the pipe operator and dplyr verbs

Use them as your default data structure in this course.