Unsupervised learning techniques analyse data without predefined labels or outcomes
They reveal hidden patterns and structures.
Clustering groups similar data points based on inherent characteristics.
Traditional clustering methods rely on predefined assumptions about the data, such as the the shape of the clusters.
Unsupervised learning offers more advanced techniques that adapt to complex, high-dimensional data, improving cluster accuracy and interpretability.
Guassian Mixture Models assume that data originates from multiple Gaussian (normal) distributions, each with distinct characteristics.
Unlike K-means, which assigns each point to a single cluster, GMM provides a probabilistic classification, allowing data points to belong to multiple clusters with varying probabilities.
Example
Libraries required
1 Dataset creation
set.seed(42)
# Define n players
num_players <- 300
# Generate synthetic data w more overlap
# Attackers High shots, high goals, moderate assists (increased SD)
attackers <- data.frame(
Shots = round(rnorm(num_players / 3, mean = 8, sd = 2.5),0),
Goals = round(rnorm(num_players / 3, mean = 5, sd = 2.5),0),
Assists = round(rnorm(num_players / 3, mean = 3, sd = 2.5),0),
Position = "Attacker"
)
# Defenders: Low shots, low goals, high assists (increased SD)
defenders <- data.frame(
Shots = rnorm(num_players / 3, mean = 3, sd = 2.5),
Goals = rnorm(num_players / 3, mean = 1, sd = 2.5),
Assists = rnorm(num_players / 3, mean = 5, sd = 2.5),
Position = "Defender"
)
# Goalkeepers: Very low shots, very low goals, low assists (increased SD)
goalkeepers <- data.frame(
Shots = round(rnorm(num_players / 3, mean = 1, sd = 2),0),
Goals = round(rnorm(num_players / 3, mean = 0, sd = 1.5),0),
Assists = round(rnorm(num_players / 3, mean = 1, sd = 2),0),
Position = "Goalkeeper"
)
# Combine into one dataset
water_polo_data <- bind_rows(attackers, defenders, goalkeepers)
rm(attackers)
rm(defenders)
rm(goalkeepers)
# View first few rows
head(water_polo_data)
Shots Goals Assists Position
1 11 8 -2 Attacker
2 7 8 4 Attacker
3 9 2 6 Attacker
4 10 10 8 Attacker
5 9 3 0 Attacker
6 8 5 0 Attacker
2 Fit GMM
----------------------------------------------------
Gaussian finite mixture model fitted by EM algorithm
----------------------------------------------------
Mclust VII (spherical, varying volume) model with 3 components:
log-likelihood n df BIC ICL
-2206.83 300 14 -4493.513 -4576.572
Clustering table:
1 2 3
106 112 82
3 Add cluster labels to the dataset
4 Visualise clustering
5 Compare clusters to positions
6 Analysing probabilities of cluster assignments
Unlike K-means, GMM gives soft assignments: each player belongs to clusters with certain probabilities.
[,1] [,2] [,3]
[1,] 0.9999996 1.163294e-10 3.790314e-07
[2,] 0.9985202 1.269046e-07 1.479720e-03
[3,] 0.9471837 3.412050e-06 5.281287e-02
[4,] 0.9999848 2.084930e-14 1.520399e-05
[5,] 0.9982464 6.070962e-05 1.692916e-03
[6,] 0.9988918 3.110449e-05 1.077056e-03
7 Advantages of GMM
GMM handles overlapping clusters well, as it assigns probabilities to each data point’s membership of the different clusters.
It’s more flexible than K-means, as it does not assume spherical clusters.
It provides probabilistic insights useful for decision-making - this is more subtle than simply allocating cluster membership on a yes/no basis.
Spectral clustering uses similarity matrices and eigenvectors to partition data.
Unlike traditional methods, it captures complex relationships, making it ideal for networked data (like football passing patterns).
For example, it could help identify strategic subgroups within teams that operate as cohesive units.
Example
Libraries required
1 Introduction
2 Generating dataset
n_players <- 25
cluster1 <- tibble(
Points_per_Game = rnorm(n_players/3, mean = 12, sd = 3.5),
Assists_per_Game = rnorm(n_players/3, mean = 4.5, sd = 1.8),
Rebounds_per_Game = rnorm(n_players/3, mean = 7, sd = 2.5),
Steals_per_Game = rnorm(n_players/3, mean = 1.4, sd = 0.4),
Blocks_per_Game = rnorm(n_players/3, mean = 1, sd = 0.3)
)
cluster2 <- tibble(
Points_per_Game = rnorm(n_players/3, mean = 15.5, sd = 3.5),
Assists_per_Game = rnorm(n_players/3, mean = 6.5, sd = 1.8),
Rebounds_per_Game = rnorm(n_players/3, mean = 9, sd = 2.5),
Steals_per_Game = rnorm(n_players/3, mean = 1.7, sd = 0.4),
Blocks_per_Game = rnorm(n_players/3, mean = 1.2, sd = 0.3)
)
cluster3 <- tibble(
Points_per_Game = rnorm(n_players/3, mean = 19, sd = 3.5),
Assists_per_Game = rnorm(n_players/3, mean = 8, sd = 1.8),
Rebounds_per_Game = rnorm(n_players/3, mean = 10, sd = 2.5),
Steals_per_Game = rnorm(n_players/3, mean = 2.0, sd = 0.4),
Blocks_per_Game = rnorm(n_players/3, mean = 1.4, sd = 0.3)
)
df <- bind_rows(cluster1, cluster2, cluster3) %>% mutate(Player_ID = row_number())
head(df)
# A tibble: 6 × 6
Points_per_Game Assists_per_Game Rebounds_per_Game Steals_per_Game
<dbl> <dbl> <dbl> <dbl>
1 16.8 8.13 6.29 2.16
2 10.0 4.39 0.359 1.23
3 13.3 6.85 0.899 1.30
4 14.2 8.62 10.3 0.695
5 13.4 2.00 6.23 1.58
6 11.6 4.00 2.55 1.14
# ℹ 2 more variables: Blocks_per_Game <dbl>, Player_ID <int>
3 Data visualisation - raw data
4 Constructing the similarity graph
X <- df %>% select(-Player_ID) %>% mutate(across(everything(), as.numeric)) %>% scale()
# Compute k-nearest neighbors adjacency matrix
k <- min(10, nrow(X) - 1) # Ensure k is valid
knn <- get.knn(X, k=k)
adjacency_matrix <- matrix(0, nrow=nrow(X), ncol=nrow(X))
for (i in 1:nrow(X)) {
adjacency_matrix[i, knn$nn.index[i, ]] <- 1
adjacency_matrix[knn$nn.index[i, ], i] <- 1 # Ensure symmetry
}
graph <- graph_from_adjacency_matrix(adjacency_matrix, mode = "undirected", diag = FALSE)
# Improved visualisation
plot(
graph,
vertex.size=10, # Adjust node size
vertex.label=NA,
edge.width=1.5, # Moderate edge width
main="Enhanced KNN Graph of Players",
layout=layout_with_kk(graph), # Kamada-Kawai layout - better spacing
edge.color="darkgrey", # Darken edges for clarity
vertex.color="steelblue" # Improve node visibility
)
5 Explanation
Each node represents a player, and edges indicate similarity based on performance metrics. Players connected by edges have similar stats.
Graph captures local relationships between players using a k-nearest neighbors (KNN) approach, showing natural groupings based on shared characteristics.
Structure of graph helps define clusters in a way that traditional distance-based methods (like k-means) might miss, making it useful for identifying non-convex or irregular clusters.
6 Spectral Clustering - implementation
# Compute similarity matrix
similarity_matrix <- exp(-as.matrix(dist(X))^2 / (2 * sd(as.matrix(dist(X)))^2))
# Compute normalised Laplacian matrix
d <- rowSums(similarity_matrix)
D <- diag(d)
L <- D - similarity_matrix
D_inv_sqrt <- diag(1 / sqrt(d))
L_norm <- D_inv_sqrt %*% L %*% D_inv_sqrt
eigen_decomp <- eigen(L_norm, symmetric = TRUE)
X_transformed <- as.data.frame(eigen_decomp$vectors[,1:3])
# K-Means clustering on eigenvectors
kmeans_res <- kmeans(X_transformed, centers = 3, nstart = 10)
df$Cluster <- as.factor(kmeans_res$cluster)
7 Clustering results
8 Comparison with k-means
# Generate synthetic dataset with two moons shape
df_moons <- as.data.frame(mlbench.twonorm(100, d = 2)$x)
names(df_moons) <- c("x", "y")
# Apply K-Means
kmeans_res <- kmeans(df_moons, centers = 2, nstart = 10)
df_moons$kmeans_cluster <- as.factor(kmeans_res$cluster)
# Spectral Clustering
dist_mat <- as.matrix(dist(df_moons))
similarity_matrix <- exp(-dist_mat^2 / (2 * sd(dist_mat)^2))
d <- rowSums(similarity_matrix)
D <- diag(d)
L <- D - similarity_matrix
D_inv_sqrt <- diag(1 / sqrt(d))
L_norm <- D_inv_sqrt %*% L %*% D_inv_sqrt
eigen_decomp <- eigen(L_norm, symmetric = TRUE)
spectral_features <- as.data.frame(eigen_decomp$vectors[, 1:2])
spectral_kmeans <- kmeans(spectral_features, centers = 2, nstart = 10)
df_moons$spectral_cluster <- as.factor(spectral_kmeans$cluster)
# Vis
p1 <- ggplot(df_moons, aes(x = x, y = y, color = kmeans_cluster)) +
geom_point(size = 3) +
ggtitle("K-Means Clustering") +
theme_minimal()
p2 <- ggplot(df_moons, aes(x = x, y = y, color = spectral_cluster)) +
geom_point(size = 3) +
ggtitle("Spectral Clustering") +
theme_minimal()
library(patchwork)
p1 + p2
9 How Spectral Clustering works
Data points are treated as nodes, and edges represent similarities.
Laplacian Matrix Calculation used to construct a matrix based on the similarity between nodes.
By analysing the eigenvectors of this matrix, data is projected into a lower-dimensional space where clustering becomes easier
Finally, standard clustering methods (like K-means) are then applied in this transformed space.
Density-Based Spatial Clustering of Applications with Noise identifies clusters based on data density rather than relying purely on distances between data points.
These approaches are effective in:
Identifying arbitrarily shaped clusters, including irregular or elongated patterns that traditional clustering algorithms, like k-means, often miss.
Handling noise and outliers effectively by classifying points in low-density regions as noise, ensuring robustness even when data includes irrelevant or anomalous points.
Detecting clusters without needing prior knowledge of the number of clusters, making them highly flexible and suitable for exploratory data analysis.
1 How DBSCAN Works
minPts
neighbors within eps
distance.minPts
neighbors.2 DBSCAN in Action
rm(list=ls())
library(dbscan)
# Create synthetic dataset with two distinct clusters and noise
df_dbscan_demo <- tibble(
x = c(rnorm(50, mean = 2, sd = 0.5), rnorm(50, mean = 6, sd = 0.5), runif(20, min = 0, max = 8)),
y = c(rnorm(50, mean = 2, sd = 0.5), rnorm(50, mean = 6, sd = 0.5), runif(20, min = 0, max = 8))
)
# Apply DBSCAN
dbscan_res_demo <- dbscan(df_dbscan_demo, eps = 0.8, minPts = 5)
df_dbscan_demo$cluster <- as.factor(dbscan_res_demo$cluster)
# Visualising DBSCAN clustering
ggplot(df_dbscan_demo, aes(x = x, y = y, color = cluster)) +
geom_point(size = 3) +
ggtitle("DBSCAN Clustering") +
theme_minimal()
3 Comparing Clustering Methods
4 Libraries required
5 Generating dataset
6 K-Means Clustering
7 Spectral Clustering
dist_mat <- as.matrix(dist(df_clusters))
similarity_matrix <- exp(-dist_mat^2 / (2 * sd(dist_mat)^2))
d <- rowSums(similarity_matrix)
D <- diag(d)
L <- D - similarity_matrix
D_inv_sqrt <- diag(1 / sqrt(d))
L_norm <- D_inv_sqrt %*% L %*% D_inv_sqrt
eigen_decomp <- eigen(L_norm, symmetric = TRUE)
spectral_features <- as.data.frame(eigen_decomp$vectors[, 1:3])
spectral_kmeans <- kmeans(spectral_features, centers = 3, nstart = 10)
df_clusters$spectral_cluster <- as.factor(spectral_kmeans$cluster)
ggplot(df_clusters, aes(x = x, y = y, color = spectral_cluster)) +
geom_point(size = 3) +
ggtitle("Spectral Clustering") +
theme_minimal()
8 DBSCAN Clustering
9 Introducing OPTICS
OPTICS (Ordering Points To Identify the Clustering Structure) improves DBSCAN by using a hierarchical clustering approach, where points are ordered based on their density relationships rather than grouped directly.
OPTICS enhances DBSCAN by providing flexibility and clarity, particularly if we’re dealing with datasets containing clusters of varying densities or complex, nested groupings.
10 Example
11 OPTICS Reachability Plot
12 Summary
library(patchwork)
p1 <- ggplot(df_clusters, aes(x = x, y = y, color = kmeans_cluster)) +
geom_point(size = 3) +
ggtitle("K-Means") +
theme_minimal()
p2 <- ggplot(df_clusters, aes(x = x, y = y, color = spectral_cluster)) +
geom_point(size = 3) +
ggtitle("Spectral Clustering") +
theme_minimal()
p3 <- ggplot(df_clusters, aes(x = x, y = y, color = dbscan_cluster)) +
geom_point(size = 3) +
ggtitle("DBSCAN") +
theme_minimal()
p4 <- ggplot(df_clusters, aes(x = x, y = y, color = optics_cluster)) +
geom_point(size = 3) +
ggtitle("OPTICS") +
theme_minimal()
p1 + p2 + p3 + p4
eps
tuning.These models are unsupervised machine learning; they lack predefined labels against which to test the model.
Therefore, model validation is essential to help us assess the quality and performance of our model.
Metrics used to do this include:
Silhouette Score: measures cluster compactness - more compact the better.
Davies-Bouldin Index: assesses cluster separation - more separation the better.
Domain-Specific Comparisons: cross-checks results with subject knowledge to enhance reliability.
“Dimensionality reduction” means transforming high-dimensional data into a lower-dimensional space while preserving important patterns and structures.
Helps improve computational efficiency, reduce noise, and enhance visualisation.
Commonly-using techniques include PCA (Principal Component Analysis), t-SNE (t-Distributed Stochastic Neighbor Embedding), Manifold Learning and Neural Networks.
1 Libraries
2 Generating dataset
3 Performing PCA
4 Scree Plot: variance explained
5 PCA Biplot
6 PCA Component Loadings
PC1 PC2 PC3
Feature1 0.6970177 -0.1150203 0.70776878
Feature2 0.6957155 -0.1305247 -0.70635919
Feature3 -0.1736269 -0.9847505 0.01095666
7 Interpreting PCA Results
8 Conclusion
1 Example
2 Generating basketball dataset
# Simulated basketball player stats dataset
df_sports <- tibble(
Points_per_Game = rnorm(100, mean = 15, sd = 5),
Assists_per_Game = rnorm(100, mean = 5, sd = 2),
Rebounds_per_Game = rnorm(100, mean = 7, sd = 3),
Steals_per_Game = rnorm(100, mean = 1.5, sd = 0.5),
Blocks_per_Game = rnorm(100, mean = 1, sd = 0.5)
)
3 Performing t-SNE
4 t-SNE Visualisation
5 Comparing PCA vs. t-SNE
# Perform PCA
pca_res <- prcomp(df_sports, scale. = TRUE)
pca_df <- as_tibble(pca_res$x[, 1:2]) %>% rename(Dim1 = PC1, Dim2 = PC2)
# PCA vs. t-SNE plot
p1 <- ggplot(pca_df, aes(x = Dim1, y = Dim2)) +
geom_point(color = "red", size = 3, alpha = 0.7) +
ggtitle("PCA Result") +
theme_minimal()
p2 <- ggplot(tsne_df, aes(x = Dim1, y = Dim2)) +
geom_point(color = "blue", size = 3, alpha = 0.7) +
ggtitle("t-SNE Result") +
theme_minimal()
library(patchwork)
p1 + p2
6 When to Use t-SNE
7 Conclusion
1 Libraries required
## Note: dimRed not currently available via CRAN, so...
## Install devtools if you haven't already
# install.packages("devtools")
## Load devtools
# library(devtools)
## Install the archived version of dimRed
# install_url("https://cran.r-project.org/src/contrib/Archive/dimRed/dimRed_0.2.6.tar.gz")
library(tidyverse)
library(Rtsne)
library(ggplot2)
library(factoextra)
library(umap)
library(igraph)
library(dimRed)
set.seed(42)
2 Generating Dataset
# Simulated basketball player stats dataset
df_sports <- tibble(
Points_per_Game = rnorm(100, mean = 15, sd = 5),
Assists_per_Game = rnorm(100, mean = 5, sd = 2),
Rebounds_per_Game = rnorm(100, mean = 7, sd = 3),
Steals_per_Game = rnorm(100, mean = 1.5, sd = 0.5),
Blocks_per_Game = rnorm(100, mean = 1, sd = 0.5)
)
3 Performing t-SNE
First, we’ll repeat the application of a t-SNE model to this new data:
4 Performing UMAP
5 Performing Isomap
library(vegan)
library(ggplot2)
# Compute distance matrix
dist_matrix <- dist(df_sports, method = "euclidean")
# Apply Isomap
isomap_res <- isomap(dist_matrix, ndim = 2, k = 5)
# Extract embedded coordinates
isomap_df <- as_tibble(isomap_res$points)
# Rename dimensions
colnames(isomap_df) <- c("Dim1", "Dim2")
# Plot Isomap-transformed data
ggplot(isomap_df, aes(x = Dim1, y = Dim2)) +
geom_point(alpha = 0.7, color = "blue") +
labs(title = "Isomap Dimensionality Reduction",
x = "Dim1",
y = "Dim2") +
theme_minimal()
6 Comparing PCA, t-SNE, UMAP, and Isomap
# Perform PCA
pca_res <- prcomp(df_sports, scale. = TRUE)
pca_df <- as_tibble(pca_res$x[, 1:2])%>% rename(Dim1 = PC1, Dim2 = PC2)
p1 <- ggplot(pca_df, aes(x = Dim1, y = Dim2)) +
geom_point(color = "purple", size = 3, alpha = 0.7) +
ggtitle("PCA") +
theme_minimal()
p2 <- ggplot(tsne_df, aes(x = Dim1, y = Dim2)) +
geom_point(color = "blue", size = 3, alpha = 0.7) +
ggtitle("t-SNE") +
theme_minimal()
p3 <- ggplot(umap_df, aes(x = Dim1, y = Dim2)) +
geom_point(color = "green", size = 3, alpha = 0.7) +
ggtitle("UMAP") +
theme_minimal()
p4 <- ggplot(isomap_df, aes(x = Dim1, y = Dim2)) +
geom_point(color = "red", size = 3, alpha = 0.7) +
ggtitle("Isomap") +
theme_minimal()
library(patchwork)
p1 + p2 + p3 + p4
7 When to use each method
8 Conclusion
An autoencoder is a type of neural network that learns to compress input data into a lower-dimensional “latent space” (encoding) and then reconstruct the original data (decoding).
It’s trained to minimise the difference between the input and its reconstruction, known as the reconstruction error.
Example
In the example, I train an autoencoder on a synthetic dataset.
The network learned the typical patterns in the data (such as typical game performance in terms of points, assists, rebounds, etc.) by encoding the input into 3 latent features and then reconstructing the original 5 features.
Example
Reconstruction error as a signal
The reconstruction error measures how well the autoencoder can recreate the input data.
Low Reconstruction Error indicates that the input follows the learned normal patterns.
High Reconstruction Error suggests that the input is unusual or anomalous because it deviates from the patterns the autoencoder learned.
Usefulness?
Anomaly Detection: Analysts can use high reconstruction errors as flags for unusual events or changes in the data.
Data Compression: The latent features provide a compact representation of the data, which can be used for further analysis, visualisation, or as input to other models.
Insight Generation: By understanding which patterns the autoencoder fails to reconstruct, analysts can gain insights into rare or extreme events that warrant closer investigation.
Used to identify patterns and relationships between variables in large datasets.
Applied in areas where understanding likelihood of items or events appearing together is valuable.
The core idea is to generate rules that describe how certain variables are related.
A rule is typically represented as A → B, meaning that if A occurs, then B is likely to occur as well.
For example, in retail analysis, a common rule might be “If a customer buys bread, they are likely to also buy butter.”
Three key metrics are used:
Support refers to how frequently items appear together in the dataset, calculated as the proportion of transactions containing both A and B.
Confidence measures the likelihood that B occurs given A, indicating how often the rule holds true.
Lift evaluates how much more likely B is to occur when A is present, compared to its independent occurrence. A lift value greater than 1 suggests a positive correlation, while a value close to 1 indicates little to no association.
Example
Libraries required:
1 Create simple dataset
# Define dataset
hockey_transactions <- list(
c("Player_A", "Pass", "Assist", "Goal"),
c("Player_B", "Pass", "Goal"),
c("Player_A", "Player_B", "Shot", "Save"),
c("Player_C", "Turnover", "Goal"),
c("Player_A", "Pass", "Assist", "Goal"),
c("Player_A", "Player_D", "Shot", "Save"),
c("Player_B", "Pass", "Assist", "Goal"),
c("Player_B", "Pass", "Goal"),
c("Player_C", "Turnover", "Goal"),
c("Player_A", "Pass", "Assist", "Goal")
)
2 Convert to transactions format
3 Explore dataset
transactions as itemMatrix in sparse format with
10 rows (elements/itemsets/transactions) and
10 columns (items) and a density of 0.36
most frequent items:
Goal Pass Player_A Assist Player_B (Other)
8 6 5 4 4 9
element (itemset/transaction) length distribution:
sizes
3 4
4 6
Min. 1st Qu. Median Mean 3rd Qu. Max.
3.0 3.0 4.0 3.6 4.0 4.0
includes extended item information - examples:
labels
1 Assist
2 Goal
3 Pass
items
[1] {Assist, Goal, Pass, Player_A}
[2] {Goal, Pass, Player_B}
[3] {Player_A, Player_B, Save, Shot}
[4] {Goal, Player_C, Turnover}
[5] {Assist, Goal, Pass, Player_A}
4 Run apriori algo
Apriori
Parameter specification:
confidence minval smax arem aval originalSupport maxtime support minlen
0.6 0.1 1 none FALSE TRUE 5 0.2 2
maxlen target ext
10 rules TRUE
Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
Absolute minimum support count: 2
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 10 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 done [0.00s].
writing ... [42 rule(s)] done [0.00s].
creating S4 object ... done [0.00s].
[1] 42
lhs rhs support confidence coverage lift count
[1] {Player_C} => {Turnover} 0.2 1 0.2 5.0 2
[2] {Turnover} => {Player_C} 0.2 1 0.2 5.0 2
[3] {Save} => {Shot} 0.2 1 0.2 5.0 2
[4] {Shot} => {Save} 0.2 1 0.2 5.0 2
[5] {Goal, Player_C} => {Turnover} 0.2 1 0.2 5.0 2
[6] {Goal, Turnover} => {Player_C} 0.2 1 0.2 5.0 2
[7] {Player_A, Save} => {Shot} 0.2 1 0.2 5.0 2
[8] {Player_A, Shot} => {Save} 0.2 1 0.2 5.0 2
[9] {Pass, Player_A} => {Assist} 0.3 1 0.3 2.5 3
[10] {Goal, Player_A} => {Assist} 0.3 1 0.3 2.5 3
5 Explanation
Once Apriori algorithm has been applied to our hockey dataset, we can inspect rules to identify meaningful patterns between players and actions. The key metrics used to evaluate these rules are:
6 Visualise the rules
Available control parameters (with default values):
layout = stress
circular = FALSE
ggraphdots = NULL
edges = <environment>
nodes = <environment>
nodetext = <environment>
colors = c("#EE0000FF", "#EEEEEEFF")
engine = ggplot2
max = 100
verbose = FALSE
Association Rule Learning uses a number of different algorithms to detect the relationships (associations) between items in the dataset:
Apriori Algorithm - breadth-first approach to iteratively extend frequent itemsets. It prunes unpromising candidates but requires multiple database scans, making it less efficient for large datasets.
FP-Growth Algorithm - builds a Frequent Pattern tree (FP-tree) to compress data, reducing redundant storage. This improves efficiency over Apriori but can struggle with large FP-trees.
Eclat Algorithm - uses a depth-first search and stores itemsets as transaction IDs (tidsets), making it highly efficient for dense datasets, but memory-intensive.