Appendix A: CS506: Data Wrangling and Management – Syllabus

A.1 Course Overview
INF506: Data Wrangling and Management introduces graduate students to data wrangling and management using R and the Tidyverse ecosystem. Students will learn to import, manipulate, clean, and visualize data with a strong emphasis on practical applications and reproducible workflows.
- CS 506, Fall 2025, 3 units
- Section 001: TuTh 9:35AM-10:50AM, Learning Resource Ctr Rm 106C
- Prerequisite: Graduate status
- Mode of Instruction: Face-to-face (in person)
- Instructor’s Name & Contact:
- Marc Tollis (marc.tollis@nau.edu)
- Room 209, SICCS (Building 90, second floor)
- Office Hours: Tue 11AM-12PM
- 928-523-3406
- Marc Tollis (marc.tollis@nau.edu)
A.2 Canvas & Recorded Lectures
We will use the learning management system, Canvas, to conduct some course business, including assignment disbursement and submitting. I will use Canvas to record lectures for future viewing.
A.3 CS506 Book Website
I have compiled a course website that has supplemental text and coded examples that we will walk through in class. This website essentially serves as the course textbook and is required reading. There will be other required reading material.
A.4 Course Objectives
By the end of the course, students will be able to:
- Use R and RStudio for data analysis
- Import structured and unstructured data
- Clean and transform data using
dplyr,tidyr, and other Tidyverse packages - Create effective visualizations using
ggplot2 - Perform exploratory data analysis (EDA)
- Apply data wrangling techniques to real datasets
A.4.1 Course Student Learning Outcomes
LO1. Compare and contrast major classes of and techniques for data handling (synthesis).
Students will be able to:
1. Identify various sources of data
2. Identify and utilize tool chains appropriate for accessing data
LO2. Design and enact data manipulation, analysis, and visualization workflows for large, heterogenous datasets (application).
Students will be able to:
1. Aggregate data from multiple sources
2. Reshape data for further analysis
3. Validate data
4. Generate meaningful statistics summarizing the data
5. Visualize trends in data
LO3. Reason about advantages, preferred use cases, and weaknesses of various data manipulation techniques (application)
LO4. Develop a conceptual understanding of how the field of data management is evolving (knowledge).
Students will be able to:
1. Find and employ data management tools in R
2. Find and employ data visualization tools in R
A.4.2 Program Student Outcomes supported by this class
This course directly supports the following program student outcomes in the Masters of Science in Computational and Applied Data Science program assessment and improvement plan:
SO2. Build the practical skills to explore, analyze, manage, and visualize large data sets using the latest technologies.
SO3. Evaluate and use well accepted methods to obtain, clean, pre-process, and transform data for further processing.
SO4. Apply data science and cutting-edge analytical methods to address data-rich problems from a variety of fields, think critically about data, and drive decision making.
SO7. Identify, appraise, and investigate ethical issues surrounding data collection, use, and data-driven decision making and to act in an informed and conscientious ethical manner.
A.5 Required Materials
- Textbook: R for Data Science (free online)
- Software:
A.6 Assessments
| Component | Weight |
|---|---|
| Problem Sets (14 total) | 30% |
| Quizzes (6 total, lowest dropped) | 50% |
| Workshops (2) | 15% |
| Attendance | 5% |
- Grades will be assigned using the weighted sum described above using this scale: A ≥ 90%, B ≥ 80%, C ≥ 70%, D ≥ 60%, F < 60%.
A.7 Grading and Submission
- Problem Sets are simple assignments that will be completed on your own and submitted via Canvas.
- Problem sets are marked as complete or incomplete.
- Quizzes are written and completed in-class.
- The final quiz is a case study project starting in class and due during finals week.
- Workshops will take up class time and attendance is required for the workshop grade.
- Workshop assignments will be submitted via Canvas.
- All Canvas-based assignments are due Sunday 11:59PM the week they are assigned (except the Mini Hackathon).
A.8 Course Schedule (Fall 2025)
| Week | Dates (T/Th) | R4DS Chapters | Topics | Assignments | Quiz |
|---|---|---|---|---|---|
| 1 | Aug 26 / 28 | Ch. 1 | Intro to R, RStudio, and Quarto: Projects, rendering .qmd to .pdf |
PS1 | |
| 2 | Sept 2 / 4 | Ch. 2 – Data Visualization | Data Visualization with ggplot2: Aesthetics, geoms, facets | PS2 | |
| 3 | Sept 9 / 11 | Ch. 3 – Data Transformation | Data Transformation (Rows): filter(), arrange() |
PS3 | Quiz 1 |
| 4 | Sept 16 / 18 | Ch. 3 – Data Transformation | Grouping & Summarization: group_by(), summarize() |
PS4 | |
| 5 | Sept 23 / 25 | Ch. 5 – Tidy Data | Tidy Data | PS5 | Quiz 2 |
| 6 | Sept 30 / Oct 2 | Ch. 10 – Exploratory Data Analysis | Exploratory data analysis: distributions, patterns, relationships | PS6 | |
| 7 | Oct 7 / 9 | Mini Hackathon (Tuesday) | Mini Hackathon | Quiz 3 (Thursday) | |
| 8 | Oct 14 / 16 | Ch. 7 - Data Import | Data Import: readr, column types |
PS7 | |
| 9 | Oct 21 / 23 | Ch. 12 through 18 – Transform | Logical Vectors and Numbers; Strings & Regular Expressions: stringr |
PS8 | |
| 10 | Oct 28 / 30 | Ch. 12 through 18 – Transform (continued) | Factors & Categorical Data: forcats |
PS9 | Quiz 4 (Thursday) |
| 11 | Nov 4 / 6 | Ch. 19 – Joins | Relational Data: joining tables (left_join, etc.) |
PS10 | |
| 12 | Nov 11* / 13 | Ch. 20-24 – Advanced Importing | Advanced Importing, databases, web scraping | PS11 | |
| 13 | Nov 18 / 20 | TBD | TBD | Quiz 5 (may be take-home) | |
| 14 | Nov 25 / 27 | — | Nov 25: Code Review Workshop Nov 27: Thanksgiving – No Class |
Code Review Workshop | — |
| 15 | Dec 2 / 4 | — | Course Wrap-up & Final Quiz | Quiz 6 |
* Nov 11 (Veterans Day) – no class that Tuesday.
A.9 Resources
- RStudio Cheatsheets
- DataCamp & Coursera tutorials for extra practice
- Office hours for additional help
A.10 Policies
A.10.1 Course Policies
Students are encouraged to attend the office hours of the instructor. If a student cannot attend regular office hours with the instructor, an appointment may be considered if made via email with sufficient advanced notice.
Emails addressed to the instructor must be respectful and professional. The instructor will respond to emails promptly, within 2 business days. The instructor will generally not respond to emails on weekends or after working hours (i.e., in the evenings), so please plan accordingly.
Cheating, including plagiarism of writing or computer code, will not be tolerated. All academic integrity violations are treated seriously. Academic integrity violations will result in penalties including, but not limited to, a zero on the assignment, a failing grade in the class, or expulsion from NAU. The University’s Academic Integrity policies will be strictly enforced.
Each student is required to demonstrate respect towards their peers and the instructor. The instructor is held to the same standard. - The instructor will not provide copies of course notes. These materials should be sought from the students’ peers or by watching the recorded lectures.
Electronic device usage must support learning in the class. All cell phones, PDAs, music players and other entertainment devices must be turned off (or put on silent) during lecture.
Grades will be entered in Canvas and . Please check LOUIE for your final grade.
Attendance: Active participation in coding activities is expected. Repeated, unexcused absences may affect the student’s grade.
Late Work: Accepted only with prior arrangement.
Academic Integrity: Students must adhere to NAU’s academic integrity policy.
A.10.2 University Policies
- Please see this document for all of the required Syllabus Policy Statements that equally apply to this course.
This syllabus is subject to minor adjustments. Updates will be announced in class and posted on Canvas.