Spatial Cluster Analysis for Landslide Activity in Western Pennsylvania
Abstract
Using digitized Quadrangle maps produced by the US Geologic Survey which indicated locations and types of landslides across the state, a cluster analysis was run to determine groupings of landslides. Zonal statistics were calculated in ArcGIS for different landscape features. Using R, hierarchical clustering was run on these normalized statistical outputs to determine if there was any spatial clustering concerning landslides. Only about nine percent of the landslides were determined to have their origin in manmade features, indicating mostly natural geomorphological processes working. Additionally, most of the landslide features were spatially grouped in an area that was determined to only contain one cluster. Over ninety-nine of all the digitized landslides were grouped this way. This was an indicator that while the mobile regolith is active in this region, the magnitude of these events may be generally small.
Please access the full document below:
R-code for analysis:
# Load packages
library(cluster)
library(dplyr)
library(rmarkdown)
library(knitr)
library(sf)
# Read data
MinMaxCurvatureCSV <- ("C:/Users/Thoma/OneDrive - University of Pittsburgh/Spring Semester 2023/GEOL 1060 Geomorphology/Lab/Lab 2/Group Data/OneDrive_1_4-3-2023/CSVs/MinMaxCurvatureCSV.csv")
MinMaxSlopeCSV <-read.csv("C:/Users/Thoma/OneDrive - University of Pittsburgh/Spring Semester 2023/GEOL 1060 Geomorphology/Lab/Lab 2/Group Data/OneDrive_1_4-3-2023/CSVs/MinMaxSlopeCSV.csv")
MinMaxTMICSV <- read.csv("C:/Users/Thoma/OneDrive - University of Pittsburgh/Spring Semester 2023/GEOL 1060 Geomorphology/Lab/Lab 2/Group Data/OneDrive_1_4-3-2023/CSVs/MinMaxTMICSV.csv")
MinMaxTRICSV <- read.csv("C:/Users/Thoma/OneDrive - University of Pittsburgh/Spring Semester 2023/GEOL 1060 Geomorphology/Lab/Lab 2/Group Data/OneDrive_1_4-3-2023/CSVs/MinMaxTMICSV.csv")
TabAreaLanslideCSV <- read.csv("C:/Users/Thoma/OneDrive - University of Pittsburgh/Spring Semester 2023/GEOL 1060 Geomorphology/Lab/Lab 2/Group Data/OneDrive_1_4-3-2023/CSVs/TabAreaLanslideCSV.csv")
#check CRS
st_crs(MinMaxCurvatureCSV)
st_crs(MinMaxSlopeCSV)
st_crs(MinMaxTMICSV)
st_crs(MinMaxTRICSV)
St_crs(TabAreaLanslideCSV)
# Join data
zstat_1 <- left_join(TabAreaLanslideCSV, MinMaxCurvatureCSV, by = c("ID" = "Id"));
zstat_2 <- left_join(zstat_1, MinMaxSlopeCSV, by = c("ID" = "Id"));
zstat_3 <- left_join(zstat_2, MinMaxTMICSV, by = c("ID" = "Id"));
clusterdata <- left_join(zstat_3,MinMaxTRICSV, by = c("ID" = "Id"))
#Remove NA/no data
clusterdata<-na.omit(clusterdata)
#cbind into scaled matrix
clusterdata2<-cbind(scale(cbind(clusterdata[,3:12],clusterdata[,16:18])))
#compute distance matrix
d <- dist(clusterdata2)
#hierarchical cluster
hc <- hclust(d, method = "complete")
#plot dendrogram
plot(hc,
main = "Dendrogram for Western PA Landslide Clusters",
col = "dark green",
hang = -1,
labels = clusterdata$OBJECTID,
xlab = "Hierarchical Clusters",
ylab = "Height (m)", # Add y-axis label for dendrogram height
font.lab = 2,
cex.lab = 1.2, # Increase the size of x-axis label
cex.axis = 0.8, # Reduce the size of axis labels
cex.main = 1.5, # Increase the size of the main title
font.main = 4, # Use bold font for the main title
sub = "", # Remove the sub-title
hang.leaf = TRUE, # Use hanging leaf style for labels
cex = 0.7, # Reduce the size of the dendrogram branches
main.col = "black", # Set the color of the main title
sub.col = "black", # Set the color of the sub-title
font.axis = 3, # Use bold font for the axis labels
font.labels = 2, # Use bold font for the leaf labels
col.axis = "black", # Set the color of the axis labels
col.main = "black", # Set the color of the main title
col.sub = "black", # Set the color of the sub-title
col.labels = "black" # Set the color of the leaf labels
)
box(lwd = 3)
#Extract clusters at height of 2000
clusters<-cutree(hc,h=70)
#Combine ID column and extracted clusters for each ID via cutree
polygonCluster<-cbind(TabAreaLanslideCSV[1:(nrow(TabAreaLanslideCSV)-3),1], clusters)