Data Dinner — April 2024

Written by:

Above, I have embedded a video recording of a session I led on using R to analyze tidy data. The audience is a group of engineers who generate high-dimensional datasets, but often brute force visualizing the data. Foci of the workshop include getting R set up, exploring generative AI as a helpful tool while you’re learning the syntax, and getting acquainted with ggplot2.

Outline:

  • 0:31 Introduction and overview
  • 3:46 Installing R and RStudio
  • 6:30 R workflow and initializing an R environment with projects
  • 16:02 Quarto markdown format and RStudio IDE
  • 19:49 Packages in R
  • 29:49 Introduction to Tidy Data
  • 39:49 Hands-on example with mtcars dataset
  • 40:01 Importing data
  • 51:24 Plotting with ggplot2
  • 56:04 Using AI assistants to help with R coding
  • 1:02:05 Grammar of Graphics Exploration

Key Takeaways:

  • R and RStudio provide a reproducible workflow for data analysis
  • R projects help manage dependencies and create self-contained, shareable analysis
  • Quarto markdown allows combining formatted text, code, and output in a single document
  • Packages extend R’s functionality for data manipulation, visualization, and specialized analysis
  • The tidyverse collection of packages, especially dplyr, tidyr and ggplot2, enable powerful yet concise data manipulation and visualization
  • AI assistants can significantly speed up R coding by providing contextually relevant code snippets and modifications

Questions to Consider for Next Session

  1. How can we apply these data manipulation and visualization techniques to real datasets from the lab?
  2. What are some best practices for creating publication-quality plots in R?
  3. How can we effectively combine data from multiple related experiments for joint analysis in R?
  4. How can we incorporate AI assistance into our R workflow without compromising code understanding and robustness?

Leave a comment

Previous: