Learning about groups in data without prior information can be done using the following steps:
- Load the data and run a PCA:
library(factoextra)
library(Biobase)
load(file.path(getwd(), "datasets", "ch1", "modencodefly_eset.RData") )
expr_pca <- prcomp(exprs(modencodefly.eset), scale=TRUE, center=TRUE ) fviz_screeplot(expr_pca)
- Extract the principal components and estimate the optimal clusters:
main_components <- expr_pca$rotation[, 1:3] fviz_nbclust(main_components, kmeans, method = "wss")
- Perform k-means clustering and visualizing:
kmean_clus <- kmeans(main_components, 5, nstart=25, iter.max=1000)
fviz_cluster(kmean_clus, data = main_components, palette = RColorBrewer::brewer.pal(5, "Set2"), ggtheme = theme_minimal(), main = "k-Means Sample Clustering" )