Visualisation with ggplot2 and Good Coding Style

B1700, Week Two

Recap of Week One

What We Covered

  • Navigating RStudio (script, console, environment, plots/files)
  • Running code in the console and from a script
  • Using R as a calculator
  • Assigning variables with <-
  • Installing and loading packages
  • Importing a simple dataset (CSV)
  • Inspecting data with functions like head(), str(), summary()

R as a Calculator

2 + 2
[1] 4
3 * 5
[1] 15
10 / 3
[1] 3.333333
sqrt(25)
[1] 5

Variables & Assignment

  • Use <- to store a value in a name
  • Then you can reuse it later
x <- 10
y <- 3
result <- x * y

result
[1] 30

Packages

  • Install once, load every time you use them
library(ggplot2)              # every session

Importing Data

# Using readr (part of the tidyverse)
library(readr)
data <- read_csv("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv")

head(data)
# A tibble: 6 × 5
  sepal_length sepal_width petal_length petal_width species
         <dbl>       <dbl>        <dbl>       <dbl> <chr>  
1          5.1         3.5          1.4         0.2 setosa 
2          4.9         3            1.4         0.2 setosa 
3          4.7         3.2          1.3         0.2 setosa 
4          4.6         3.1          1.5         0.2 setosa 
5          5           3.6          1.4         0.2 setosa 
6          5.4         3.9          1.7         0.4 setosa 
  • read_csv() is tidyverse-friendly

  • head() shows the first few rows

Inspecting Data

str(data)       # structure of the dataset
spc_tbl_ [150 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ sepal_length: num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ sepal_width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ petal_length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ petal_width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ species     : chr [1:150] "setosa" "setosa" "setosa" "setosa" ...
 - attr(*, "spec")=
  .. cols(
  ..   sepal_length = col_double(),
  ..   sepal_width = col_double(),
  ..   petal_length = col_double(),
  ..   petal_width = col_double(),
  ..   species = col_character()
  .. )
 - attr(*, "problems")=<externalptr> 

Inspecting Data

summary(data)   # quick stats
  sepal_length    sepal_width     petal_length    petal_width   
 Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
 1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
 Median :5.800   Median :3.000   Median :4.350   Median :1.300  
 Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
 3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
 Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
   species         
 Length:150        
 Class :character  
 Mode  :character  
                   
                   
                   
  • Helps spot problems (e.g. missing values, wrong data types)

Running Scripts

  • Using menu commands
  • Using keyboard shortcuts

By this point you should be able to…

✅ Open RStudio and run code reproducibly from a script
✅ Assign variables and perform simple operations
✅ Install and load packages
✅ Import and inspect a dataset
✅ Understand the difference between console vs. script

Section One: Introduction

1.1 Overview

  • Today, we’ll focus on:
    • Data visualisation with ggplot2
    • Code style “best practices” for writing clean, readable, and efficient R code.

1.2 The Importance of Visualising Data

  • Visualisation is key to understanding data.
  • It allows us to uncover patterns, trends, and outliers.
  • A well-crafted visual makes complex data more accessible and easier to interpret.

Visualisation turns raw data into insights that can be easily understood.

1.3 Why Code Style Matters

  • Writing clean and readable code helps you:
    • Debug errors easily.
    • Make your code understandable for others (and yourself) in the future.
    • Promote collaboration in team-based projects.

1.4 Recap of Pre-Class Readings

  • Data Visualisation: Introduction to ggplot2, creating basic plots.
  • Workflow: Code Style: Best practices for writing clean, efficient code.

1.5 Building Confidence in R

  • We’re focussing on visualisation today.
  • But that’s not really the objective!
  • Instead, it’s about building confidence in using R and RStudio and understanding how you can use them

Section Two: Loading Data from a URL

2.1 Loading Data from a URL

  • In this module (and B1705), I use Dropbox to store datasets and make them available for you to download directly.
  • To get started, you need to be able to use read.csv() to load a dataset from a URL directly into R.

Here’s an example of how I would load the dataset:

# Load the dataset from a URL
url <- "https://www.dropbox.com/scl/fi/5cju035u9rnez7cfclqki/ah_data_01.csv?rlkey=qxo23mhfz15ol6cz4kl0smuc9&dl=1"
data <- read.csv(url)

# View the first few rows of the data
head(data)
  • First, we create an object called url that contains a value (in this case, a dropbox link)
  • Then, we create an object called data by running a function called read.csv
  • This function (more later) needs just one thing - a value which in this case is the object url, which is a value which is the dropbox link!
  • We can check if it has worked by (1) looking at our Environment window and (2) running the head command.

2.2 Pair Activity: Load the Dataset

Note

The term ‘observation’ refers to a row in a dataset.

2.3 Review of Dataset Loading

  • Was everyone was able to load the dataset correctly?
  • Questions to Consider:
    • Did the dataset load correctly?
    • Did anyone encounter issues with the URL or dataset format?
  • Common problems:
    • Did you check the URL format.
    • Did you ensure the dataset is in CSV format.
    • Did you confirm you are using the correct function (read.csv()).

Getting a dataset into R correctly is the first step to being able to perform any analysis. And it’s often one of the hardest things to do.

2.4 Recap: Loading Data from Dropbox

  • Always ensure that URL is correct, and dataset accessible.
    • Dropbox - change 0 to 1
    • Make sure you can use Onedrive to share files
  • Use read.csv() and write.csv() to save and load CSV datasets from internet/cloud storage.
  • Remember to use head() to quickly view first few rows of dataset.
  • Check you have a new dataframe in your Environment window.

I will usually refer to Data objects in the Environment as “dataframes” from now on…

2.5 Writing Data

  • You can write dataframes as files (this removes the need to retain them in the project memory).

  • Note that this writes the file by default to your project directory.

# Create a small dataframe
scores <- data.frame(
  player = c("Alice", "Bob", "Charlie"),
  goals  = c(3, 5, 2),
  assists = c(1, 0, 4)
)

# Write to CSV
write.csv(scores, "scores.csv", row.names = FALSE)

# Check it worked (read it back in)
read.csv("scores.csv")
   player goals assists
1   Alice     3       1
2     Bob     5       0
3 Charlie     2       4

2.6 R and Multiple Dataframes

  • R can store multiple dataframes at the same time (unlike SPSS etc.) For this reason you need to tell it what data to use when running code.

  • Some functions ask you to state the dataframe (as above).

2.7 Specifying variables

  • To refer to a specific variable in a specific dataframe, you need to use the dollar sign ($).

  • Type sum(HomeGoals) in the console, hit return, and see what happens. Then type sum(match_data$HomeGoals) and see what happens.

Section Three: Using ggplot2

3.1 Introduction to Plot Types in ggplot2

  • ggplot2 provides a variety of plot types to help you visualise different kinds of data.
  • The type of plot you choose depends on your data and the story you want to tell.
  • Each type of plot is suited for different kinds of relationships in your data.
  • You don’t need to use ggplot2 but it’s the standard plotting package in R.

3.2 Examples of Plot Types

  • Let’s take a look at some examples of common plot types using the mtcars dataset.

Tip

R comes with a selection of datasets ‘built in’, like mtcars. These are really useful if you want to experiment or play around with code.

Scatter Plot

Code
library(ggplot2)

# Scatter plot of mpg vs. hp
ggplot(mtcars, aes(x = hp, y = mpg)) +
  geom_point() +
  labs(title = "Scatter Plot: mpg vs. hp", x = "Horsepower", y = "Miles per Gallon")

Bar plot

Code
ggplot(mtcars, aes(x = factor(cyl))) +
  geom_bar() +
  labs(title = "Bar Plot: Count of Cars by Cylinders", x = "Number of Cylinders", y = "Count")

Line plot

Code
ggplot(mtcars, aes(x = seq_along(mpg), y = mpg)) +
  geom_line() +
  labs(title = "Line Plot: mpg Over Row Order", x = "Row Order", y = "Miles per Gallon")

Histogram

Code
ggplot(mtcars, aes(x = mpg)) +
  geom_histogram(binwidth = 5) +
  labs(title = "Histogram: Distribution of mpg", x = "Miles per Gallon", y = "Frequency")

Box plot

Code
ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
  geom_boxplot() +
  labs(title = "Box Plot: mpg by Cylinders", x = "Number of Cylinders", y = "Miles per Gallon")

3.4 Recap of Plot Types

  • Scatter Plot: Relationship between continuous variables.
  • Bar Plot: Categorical data or counts.
  • Line Plot: Trends or continuous data over time.
  • Histogram: Distribution of a continuous variable.
  • Box Plot: Summary of continuous variable by group.

Section Four: More on ggplot2

4.1 The Core Components of ggplot2

  • ggplot2 is built around three key components:
    1. Data: The dataset that you are plotting.
    2. Aesthetics (aes): Mapping variables in the dataset to visual properties of the plot (e.g., x and y axes, colours, size).
    3. Geometries (geom_*): The visual elements representing the data (e.g., points, lines, bars).
  • Together, these elements form a flexible system that allows you to layer and customise plots.

The power of ggplot2 lies in how you combine data, aesthetics, and geometries to create customised visualisations.

4.2 Creating a Basic Plot

  • A basic ggplot2 plot follows a simple structure:
    1. Define the data: The dataset you are using (e.g., mtcars).
    2. Specify the aesthetics: Map variables to visual properties like axes and color.
    3. Add a geometry: Define what type of plot you want (e.g., scatter plot, bar plot).

Example: Scatter Plot

# Define the data and aesthetics, then add a scatter plot geometry
ggplot(mtcars, aes(x = hp, y = mpg)) +
  geom_point() +
  labs(title = "Scatter Plot: mpg vs. hp", x = "Horsepower", y = "Miles per Gallon")
# Define the data and aesthetics, then add a scatter plot geometry
ggplot(mtcars, aes(x = hp, y = mpg)) +
  geom_point() +
  labs(title = "Scatter Plot: mpg vs. hp", x = "Horsepower", y = "Miles per Gallon")

Explanation:

  • ggplot(mtcars, aes(x = hp, y = mpg)): Specifies the data and maps hp to the x-axis and mpg to the y-axis.
  • geom_point(): Adds scatter plot points.
  • labs(): Customises title and axis labels.
# Define the data and aesthetics, then add a scatter plot geometry
ggplot(mtcars, aes(x = hp, y = mpg)) +
  geom_point() +
  labs(title = "Scatter Plot: mpg vs. hp", x = "Horsepower", y = "Miles per Gallon")
  • The ggplot() function sets up the framework
  • geoms add the visual elements.
  • The combination creates the plot.

4.3 Layers in ggplot2

  • ggplot2 is based on layering elements one at a time.
  • This allows for fine-grained control over the final plot.

Basic Structure of Layers:

  • Base layer: Defined with ggplot(), setting data and aesthetics.

  • Geom layers: Added using functions like geom_point(), geom_line(), etc.

  • Additional layers: These might include labels, themes, or custom scales.

Example: Adding Multiple Layers using +

Code
# Adding points and a smoother line
ggplot(mtcars, aes(x = hp, y = mpg)) +
  geom_point(color = "blue") +  # Points layer
  geom_smooth(method = "lm", se = FALSE, color = "red") +  # Line layer
  labs(title = "Scatter Plot with Linear Regression Line", x = "Horsepower", y = "Miles per Gallon")

Explanation:

  • geom_point(color = "blue"): Adds blue scatter points.

  • geom_smooth(method = "lm", se = FALSE): Adds a linear regression line without the confidence interval (se = FALSE).

  • Each layer is stacked in sequence, giving you control over each visual element.

  • Layers let you add more elements to your plot, like trend lines or annotations, and adjust each one individually.

4.4 Customising Aesthetics and Geometries

  • Aesthetics (aes()): You can map not only position (x and y) but also other variables to colour, size, shape, etc.
  • Geometries (geom_*): You can customize visual properties of the geoms, such as size, colour, shape, and transparency.
  • Remember to use US spellings for color etc….

Example: Customising Point Size and Colour

Code
# Customise point size and colour based on 'cyl'
ggplot(mtcars, aes(x = hp, y = mpg, color = factor(cyl), size = wt)) +
  geom_point() +
  labs(title = "Customized Scatter Plot", x = "Horsepower", y = "Miles per Gallon")

Explanation:

  • color = factor(cyl): Maps cyl to color.

  • size = wt: Maps wt (weight) to point size.

Customising aesthetics lets us emphasise important patterns and relationships in the data.

4.5 Facets: Small Multiples for Comparison

  • Facets allow you to split a single plot into multiple subplots based on the values of one or more categorical variables.
  • You can use facet_wrap() or facet_grid() to create small multiples for comparison.

Example: Faceting by Number of Cylinders

# Facet the plot by the number of cylinders
ggplot(mtcars, aes(x = hp, y = mpg)) +
  geom_point() +
  facet_wrap(~cyl) +
  labs(title = "Scatter Plot: mpg vs. hp, Faceted by Cylinders", x = "Horsepower", y = "Miles per Gallon")

Explanation:

  • facet_wrap(~cyl): Creates separate plots for each level of the cyl variable (e.g., 4, 6, and 8 cylinders).

Faceting is a great way to compare subsets of data side-by-side without ‘cluttering’ a single plot.

4.6 Pair Activity: Building Your First ggplot2 Plot

  • Objective:
    • First, you will download the football match dataset.
    • Then, you will create a plot to visualise the match outcomes and customize it with layers and faceting.

Task Steps:

  1. Download the Dataset:
    • Download the dataset from the URL: https://www.dropbox.com/scl/fi/wyrihmdl20gsftkhhai79/data_02.csv?rlkey=nh9zu2glcpw36qur81tjnpoip&dl=1
    • Load the dataset into R using the following code:
   # Load the football match dataset
   url <- "https://www.dropbox.com/scl/fi/wyrihmdl20gsftkhhai79/data_02.csv?rlkey=nh9zu2glcpw36qur81tjnpoip&dl=1"
   match_data <- read.csv(url)
   head(match_data)
  MatchID HomeTeam AwayTeam HomeGoals AwayGoals HomePossession AwayPossession
1       1   Team G   Team A         1         2             44             56
2       2   Team G   Team D         4         0             58             42
3       3   Team C   Team A         0         5             59             41
4       4   Team F   Team A         0         4             53             47
5       5   Team C   Team E         1         0             42             58
6       6   Team B   Team C         2         1             47             53
  HomeShots AwayShots       Date
1        14         5 2025-10-04
2        16        11 2025-02-24
3         6        15 2025-11-27
4        14        11 2025-08-26
5        10        17 2025-09-09
6        19        19 2025-12-05
  • Task: In pairs:
    1. Create a scatter plot, bar plot, or line plot based on the data.
    2. Customise the plot with at least two layers (e.g., adding a trend line or changing point size).
    3. Use faceting or other customisation options to highlight different aspects of the data.

4.7 Examples

1. Scatter Plot: Goals vs. Shots

We’ll start by showing the relationship between goals scored and shots taken for the home and away teams.

Code
# Scatter plot of HomeGoals vs. HomeShots
library(ggplot2)

ggplot(match_data, aes(x = HomeShots, y = HomeGoals, color = HomeTeam)) +
  geom_point() +  # Scatter plot
  labs(title = "Home Team: Goals vs. Shots", x = "Home Team Shots", y = "Home Team Goals") +
  theme_minimal()

Explanation:

  • aes(x = HomeShots, y = HomeGoals): Maps HomeShots to the x-axis and HomeGoals to the y-axis.

  • geom_point(): Adds the scatter plot points.

  • color = HomeTeam: Colours the points by the HomeTeam variable, so each team is visually distinguishable.

2. Bar Plot: Goals Scored by Each Team

Next, we’ll create a bar plot to show the total number of goals scored by each team (both home and away).

Code
# Bar plot of total goals scored by each team
library(dplyr)
library(tidyr)
library(ggplot2)

# Summing goals scored by each team (both home and away)
goals_by_team <- match_data %>%
  gather(key = "MatchType", value = "Goals", HomeGoals, AwayGoals) %>%
  group_by(HomeTeam) %>%
  summarise(TotalGoals = sum(Goals))

# Bar plot
ggplot(goals_by_team, aes(x = reorder(HomeTeam, TotalGoals), y = TotalGoals, fill = HomeTeam)) +
  geom_bar(stat = "identity") +
  labs(title = "Total Goals Scored by Each Team", x = "Team", y = "Total Goals") +
  theme_minimal() +
  coord_flip()  # Flip the axes for better readability

Explanation:

  • We use the gather() function from the tidyr package to reshape the data, combining HomeGoals and AwayGoals into a single column (Goals).

  • The bar plot then shows the total number of goals scored by each team, ordered by the total goals.

3. Histogram: Distribution of Possession Percentages

Let’s explore the distribution of possession percentages (both for home and away teams) with a histogram.

Code
# Histogram of HomePossession
ggplot(match_data, aes(x = HomePossession)) +
  geom_histogram(binwidth = 5, fill = "blue", color = "black", alpha = 0.7) +
  labs(title = "Distribution of Home Team Possession", x = "Home Team Possession (%)", y = "Frequency") +
  theme_minimal()

Explanation:

  • geom_histogram(): Creates a histogram to show the distribution of the HomePossession variable.

  • The binwidth argument controls the width of each bin in the histogram.

4. Faceting: Goals vs. Shots by Team

Now, we can use faceting to compare how the relationship between shots and goals varies across different teams.

Code
# Scatter plot of goals vs shots, faceted by HomeTeam
ggplot(match_data, aes(x = HomeShots, y = HomeGoals)) +
  geom_point(aes(color = HomeTeam)) +
  facet_wrap(~ HomeTeam) +  # Faceting by HomeTeam
  labs(title = "Goals vs. Shots by Team", x = "Home Team Shots", y = "Home Team Goals") +
  theme_minimal()

Explanation:

  • facet_wrap(~ HomeTeam): This creates a separate plot for each team to compare their performance.

  • aes(color = HomeTeam): Adds color to the points based on the home team.

5. Customising Plots (Adding a Trend Line)

Finally, let’s add a smooth trend line to the scatter plot to visualize the general trend between shots and goals.

Code
# Scatter plot with trend line
ggplot(match_data, aes(x = HomeShots, y = HomeGoals)) +
  geom_point(aes(color = HomeTeam)) +
  geom_smooth(method = "lm", se = FALSE, color = "red") +  # Adding a linear regression line
  labs(title = "Goals vs. Shots with Trend Line", x = "Home Team Shots", y = "Home Team Goals") +
  theme_minimal()

Explanation:

  • geom_smooth(method = "lm", se = FALSE): Adds a linear regression line to show the overall trend between shots and goals, without the confidence interval (se = FALSE).

4.7 Review and Discussion

  • Let’s review some of the visualisations from the pairs:
    • What customisations did you make? How did they help clarify the data?
    • What was challenging about working with layers and facets?
  • Solution Review: Discuss different approaches and improvements.

4.8 Recap: ggplot2 Deep Dive

  • Key Components: Data, aesthetics, and geometries are the building blocks of ggplot2.
  • Layers: Add and customize different visual elements, such as points, lines, and labels.
  • Facets: Split your data into multiple subplots for easier comparison.

4.9 Summary

In this section, we’ve:

  • Explored the core components of ggplot2.
  • Created basic plots and customised them with layers, aesthetics, and geoms.
  • Introduced faceting for visual comparison of data subsets.
  • Hands-on practice with building customised plots.

Section Five: Coding Style

5.1 Why Coding Style Matters

  • Clean code is easier to understand, debug, and maintain.
  • It helps you collaborate with others, especially in team environments.
  • Readable code can be understood by others and by your future self.
  • Good organisation of code ensures that you can quickly identify errors and make updates.

5.2 Naming Conventions

  • Variable and function names should be descriptive and consistent.
  • Use snake_case (e.g., total_goals) or camelCase (e.g., totalGoals) based on your preferences.
  • Avoid using single-letter names like x or y unless they are loop variables or mathematical expressions.

Example:

Good practice

total_goals <- 10
calculate_average <- function(data) {
  mean(data$goals)
}

Bad practice

tG <- 10
calcAvg <- function(d) {
  mean(d$g)
}

5.2 What is a “Function”???

  • A function is a reusable piece of code that performs a specific task.
  • Functions take inputs (called arguments) and return an output.
  • Many functions are built into R (mean(), sum(), plot()), but you can also write your own.

Think of a function as a recipe: you give it ingredients (inputs), it follows a set of steps, and produces a dish (output).

5.3 Example: Using a Built-in Function

numbers <- c(10, 20, 30, 40)

mean(numbers)   # calculates the average
[1] 25
sum(numbers)    # adds them up
[1] 100

5.4 Creating Functions

add_two <- function(x) {
  return(x + 2)
}

add_two(5)   # returns 7
[1] 7

5.5 Organising Code into Functions

  • Modular code is easier to debug and reuse. Break your code into functions that do one thing well.
  • Function names should be verb-based and descriptive of the task they perform.

Example:

Good practice: Function does one thing

calculate_goals_per_game <- function(goals, matches_played) {
  return(goals / matches_played)
}

Bad practice: Multiple tasks in one function

process_match_data <- function(data) {
  data$goals <- data$home_goals + data$away_goals
  data$shots <- data$home_shots + data$away_shots
  return(data)
}
  • Functions should encapsulate one specific task, making them easier to test and debug.

  • A function should not handle multiple unrelated tasks; separate them into distinct functions for clarity.

5.6 Commenting and Documentation

  • Comment your code to explain the why behind your decisions, especially in complex or non-obvious sections.
  • Avoid stating the obvious. Instead, explain why you’re doing something, not what you’re doing.

Example:

Good practice: Explaining why


# Calculate the average goals per match to assess performance over the season
average_goals <- calculate_goals_per_game(total_goals, total_matches)

Bad practice: Over-explaining

total_goals <- 10  # Assign 10 to total_goals

5.7 Consistency

  • Consistent code is predictable and makes collaboration easier.
  • Stick to a coding style guide for indentation, spacing, and other formatting.
  • Decide on whether to use spaces or tabs for indentation, and be consistent throughout.

Example:

Consistent formatting

calculate_goals_per_game <- function(goals, matches_played) {
  return(goals / matches_played)
}

Inconsistent formatting (Avoid this)

calculate_goals_per_game=function(goals,matches_played){return(goals/matches_played)}

5.8 Practical Activity: Refactor Messy Code (15 minutes)

Now it’s time for you to put these principles into practice by refactoring messy code.

Task:

  • Below is a piece of code that needs refactoring. Your task is to:
    1. Rename variables and functions to make them more descriptive.
    2. Break the code into functions where necessary.
    3. Add comments to explain what the code does and why.

Example of Messy Code:

calc <- function(d) {
  t <- sum(d$hgoals) + sum(d$agoals)
  avg <- t / length(d$hgoals)
  return(avg)
}

What you need to do:

  • Rename calc to something more descriptive (e.g., calculate_total_goals).
  • Rename d, t, and avg to more meaningful names.
  • Add comments explaining the purpose of the function.

Refactored code

calculate_total_goals <- function(match_data) {
  total_goals <- sum(match_data$home_goals) + sum(match_data$away_goals)
  average_goals <- total_goals / length(match_data$home_goals)
  
  # Return the average goals per match
  return(average_goals)
}

5.9 Review and Discussion

  • Let’s go through the refactored code examples:
    • How did you improve variable and function names?
    • What parts of the code were broken into separate functions? Why?
    • How did you ensure that your code is readable and maintainable?

Solution Review: Walk through an example of the refactored code and explain the changes made.

5.10 Recap of Best Practices

  • Naming Conventions: Use descriptive names that are consistent and easy to understand.
  • Functions: Break your code into small, reusable functions that do one thing well.
  • Commenting: Use comments to explain why certain decisions were made, not what the code is doing.
  • Consistency: Ensure your code formatting is consistent throughout.

Section Six: Conclusion/ Skills Checklist

From Week 1

  • Open RStudio and run code reproducibly from a script
  • Use R as a calculator
  • Assign variables with <- and reuse them
  • Install and load packages (install.packages(), library())
  • Import a dataset from file or URL (read.csv(), read_csv())
  • Inspect a dataset with head(), str(), summary()
  • Write a dataframe to file (write.csv(), write_csv())
  • Understand the difference between console vs. script workflows

From Week 2

  • Load a dataset directly from a URL into R
  • Create basic plots with ggplot2 (scatter, bar, line, histogram, boxplot)
  • Map variables to aesthetics (aes(x, y, colour, size))
  • Customise plots with labels, themes, scales, and layers
  • Use faceting to compare subsets of data
  • Apply code style best practices (naming conventions, functions, commenting, consistency)
  • Recognise how these building blocks can scale into professional apps (Shiny demo)