library("tidyverse")
library("tibble")
library("msigdbr")
library("ggplot2")
library("TCGAbiolinks")
library("RNAseqQC")
library("DESeq2")
library("ensembldb")
library("purrr")
library("magrittr")
library("vsn")
library("matrixStats")
library("dplyr")
library("grex")
DATA_DIR <- "data/GTEx/"
GTEx_Analysis_v8_Annotations_SampleAttributesDS.txt
- A
de-identified, open access version of the sample annotations available
in dbGaP (database of genotypes and phenotypes)GTEx_Analysis_v8_Annotations_SubjectPhenotypesDS.txt
-
A de-identified, open access version of the subject phenotypes available
in dbGaP.Download these files by running:
wget https://storage.googleapis.com/adult-gtex/annotations/v8/metadata-files/GTEx_Analysis_v8_Annotations_SampleAttributesDS.txt -P data/GTEx/
wget https://storage.googleapis.com/adult-gtex/annotations/v8/metadata-files/GTEx_Analysis_v8_Annotations_SubjectPhenotypesDS.txt -P data/GTEx/
sample.df <- read.delim(paste0(DATA_DIR, "GTEx_Analysis_v8_Annotations_SampleAttributesDS.txt"), as.is = TRUE, header = TRUE, row.names = 1)
subject.df <- read.delim(paste0(DATA_DIR, "GTEx_Analysis_v8_Annotations_SubjectPhenotypesDS.txt"), as.is = TRUE, header = TRUE, row.names = 1)
The DTHHRDY
column refers to the 4-point Hardy scale: https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/variable.cgi?study_id=phs000424.v4.p1&phv=169092
Refer to the metadata files here for more information: https://gtexportal.org/home/downloads/adult-gtex/metadata
subject.df
Refer to the metadata files here for more information: https://gtexportal.org/home/downloads/adult-gtex/metadata
sample.df
Extract entries that pertain to transcriptomic (RNA) data.
rnaseq.sample.df <- sample.df[sample.df["SMAFRZE"] == "RNASEQ", ]
rnaseq.sample.df
The SMTSD
column refers to the tissue type (i.e., the
area from which the sample was taken).
as.matrix(sort(table(rnaseq.sample.df["SMTSD"]), decreasing = TRUE))
[,1]
Muscle - Skeletal 803
Whole Blood 755
Skin - Sun Exposed (Lower leg) 701
Adipose - Subcutaneous 663
Artery - Tibial 663
Thyroid 653
Nerve - Tibial 619
Skin - Not Sun Exposed (Suprapubic) 604
Lung 578
Esophagus - Mucosa 555
Adipose - Visceral (Omentum) 541
Esophagus - Muscularis 515
Cells - Cultured fibroblasts 504
Breast - Mammary Tissue 459
Artery - Aorta 432
Heart - Left Ventricle 432
Heart - Atrial Appendage 429
Colon - Transverse 406
Esophagus - Gastroesophageal Junction 375
Colon - Sigmoid 373
Testis 361
Stomach 359
Pancreas 328
Pituitary 283
Adrenal Gland 258
Brain - Cortex 255
Brain - Caudate (basal ganglia) 246
Brain - Nucleus accumbens (basal ganglia) 246
Prostate 245
Brain - Cerebellum 241
Spleen 241
Artery - Coronary 240
Liver 226
Brain - Cerebellar Hemisphere 215
Brain - Frontal Cortex (BA9) 209
Brain - Putamen (basal ganglia) 205
Brain - Hypothalamus 202
Brain - Hippocampus 197
Small Intestine - Terminal Ileum 187
Ovary 180
Brain - Anterior cingulate cortex (BA24) 176
Cells - EBV-transformed lymphocytes 174
Minor Salivary Gland 162
Brain - Spinal cord (cervical c-1) 159
Vagina 156
Brain - Amygdala 152
Uterus 142
Brain - Substantia nigra 139
Kidney - Cortex 85
Bladder 21
Cervix - Endocervix 10
Cervix - Ectocervix 9
Fallopian Tube 9
Kidney - Medulla 4
We are only interested in those from the colorectal area.
colon.sample.df <- rnaseq.sample.df %>% dplyr::filter(SMTSD == "Colon - Sigmoid")
colon.sample.df
GTEx_Analysis_2017-06-05_v8_RNASeQCv1.1.9_gene_tpm.gct
contains the gene TPMs.
Download this file by running:
wget https://storage.googleapis.com/adult-gtex/bulk-gex/v8/rna-seq/GTEx_Analysis_2017-06-05_v8_RNASeQCv1.1.9_gene_tpm.gct.gz -P data/GTEx/
tpm.df <- read.delim(paste0(DATA_DIR, "GTEx_Analysis_2017-06-05_v8_RNASeQCv1.1.9_gene_tpm.gct"),
as.is = T, row.names = 1, check.names = FALSE, skip = 2
)
gene.names.df <- tpm.df[, "Description", drop = FALSE]
tpm.df <- tpm.df[, !(names(tpm.df) %in% c("Description"))]
{
cat(paste("Number of genes in table: ", dim(tpm.df)[1]))
}
Number of genes in table: 56200
Perform some data preprocessing.
# Remove version number: https://www.rdocumentation.org/packages/grex/versions/1.9/topics/cleanid
tpm.df$ensembl <- cleanid(rownames(tpm.df))
head(tpm.df$ensembl)
[1] "ENSG00000223972" "ENSG00000227232" "ENSG00000278267" "ENSG00000243485" "ENSG00000237613" "ENSG00000268020"
FADD is one of the key genes involved in necroptosis.
# Create a named vector to map names to IDs
name_to_id <- setNames(rownames(gene.names.df), gene.names.df$Description)
# Create a named vector to map IDs to names
id_to_name <- setNames(gene.names.df$Description, rownames(gene.names.df))
# To retrieve the ID for 'FADD'
print(name_to_id[["FADD"]])
[1] "ENSG00000168040.4"
print(id_to_name[["ENSG00000168040.4"]])
[1] "FADD"
We obtain the gene set from the Human MSigDB Collections:
GOBP_NECROPTOTIC_SIGNALING_PATHWAY
refers to the
necroptosis gene set (found via MSigDB’s search functionality: https://www.gsea-msigdb.org/gsea/msigdb/human/search.jsp).C5
consists of genes annotated by the same ontology
term (https://www.gsea-msigdb.org/gsea/msigdb/).GO:BP
refers to the “biological process” category in
Gene Ontology.CAVEAT! GOBP_NECROPTOTIC_SIGNALING_PATHWAY
contains only
8 genes.
necroptosis.genes <- msigdbr(species = "human", category = "C5", subcategory = "GO:BP") %>%
dplyr::filter(gs_name == "GOBP_NECROPTOTIC_SIGNALING_PATHWAY")
necroptosis.genes
We obtain the gene set from the Human MSigDB Collections:
WP_FERROPTOSIS
refers to the ferroptosis gene set
(found via MSigDB’s search functionality: https://www.gsea-msigdb.org/gsea/msigdb/human/search.jsp).C2
consists of the curated gene sets (https://www.gsea-msigdb.org/gsea/msigdb/).CP:WIKIPATHWAYS
refers to the curated gene set from
WikiPathways.Note that there is another ferroptosis gene set:
GOBP_FERROPTOSIS
.
However, it contains only 10 genes
(for comparison, WP_FERROPTOSIS
contains 64 genes).
ferroptosis.genes <- msigdbr(species = "human", category = "C2", subcategory = "CP:WIKIPATHWAYS") %>%
dplyr::filter(gs_name == "WP_FERROPTOSIS")
ferroptosis.genes
We obtain the gene set from the Human MSigDB Collections:
REACTOME_PYROPTOSIS
refers to the pyroptosis gene set
(found via MSigDB’s search functionality: https://www.gsea-msigdb.org/gsea/msigdb/human/search.jsp).C2
consists of the curated gene sets (https://www.gsea-msigdb.org/gsea/msigdb/).CP:REACTOME
refers to the curated gene set from
Reactome.REACTOME_PYROPTOSIS
contains 27 genes.
pyroptosis.genes <- msigdbr(species = "human", category = "C2", subcategory = "CP:REACTOME") %>%
dplyr::filter(gs_name == "REACTOME_PYROPTOSIS")
pyroptosis.genes
Select only the tissue samples (columns) from the colorectal area.
tpm.colon.df <- tpm.df %>% dplyr::select(c(ensembl, rownames(colon.sample.df)))
tpm.colon.df
Get the TPM for the genes in the necroptosis gene set.
tpm.colon.necro.df <- tpm.colon.df %>% dplyr::filter(ensembl %in% necroptosis.genes$ensembl_gene)
tpm.colon.necro.df2 <- left_join(tpm.colon.necro.df, necroptosis.genes %>% dplyr::select(ensembl_gene, gene_symbol), by = c("ensembl" = "ensembl_gene"))
tpm.colon.necro.df$gene_symbol <- tpm.colon.necro.df2$gene_symbol
tpm.colon.necro.df
Compute the median TPM per gene.
gene.expressions.necro <- data.frame(
row.names = tpm.colon.necro.df$gene_symbol,
tissue = "Colon",
tpm = matrixStats::rowMedians(as.matrix(tpm.colon.necro.df %>% dplyr::select(-c("ensembl", "gene_symbol"))))
)
gene.expressions.necro
Plot the gene expression.
ggplot(gene.expressions.necro, aes(y = tissue, x = rownames(gene.expressions.necro), fill = tpm)) +
geom_tile() +
scale_fill_gradient(low = "white", high = "red") +
theme_minimal() +
theme(
axis.text.x = element_text(size = 9, hjust = 1)
) +
labs(
title = "Expression of necroptosis-related genes in GTEx colon tissues",
x = "Gene",
y = "Area",
fill = "TPM"
) +
geom_text(aes(label = tpm), vjust = 1)
Sanity Check: RIPK1 and necroptosis - https://www.nature.com/articles/s12276-022-00847-4
Get the TPM for the genes in the ferroptosis gene set.
tpm.colon.ferro.df <- tpm.colon.df %>% dplyr::filter(ensembl %in% ferroptosis.genes$ensembl_gene)
tpm.colon.ferro.df2 <- left_join(tpm.colon.ferro.df, ferroptosis.genes %>% dplyr::select(ensembl_gene, gene_symbol), by = c("ensembl" = "ensembl_gene"))
tpm.colon.ferro.df$gene_symbol <- tpm.colon.ferro.df2$gene_symbol
tpm.colon.ferro.df
Compute the median TPM per gene.
gene.expressions.ferro <- data.frame(
row.names = tpm.colon.ferro.df$gene_symbol,
tissue = "Colon",
tpm = matrixStats::rowMedians(as.matrix(tpm.colon.ferro.df %>% dplyr::select(-c("ensembl", "gene_symbol"))))
)
gene.expressions.ferro
Plot the gene expression.
ggplot(gene.expressions.ferro, aes(y = tissue, x = rownames(gene.expressions.ferro), fill = tpm)) +
geom_tile() +
scale_fill_gradient(low = "white", high = "red") +
theme_minimal() +
theme(
axis.text.x = element_text(size = 7, angle = 90, hjust = 1)
) +
labs(
title = "Expression of ferroptosis-related genes in GTEx colon tissues",
x = "Gene",
y = "Area",
fill = "TPM"
)
Sanity Check: FTH1 and ferroptosis - https://www.nature.com/articles/s41420-022-00902-z
Get the TPM for the genes in the pyroptosis gene set.
tpm.colon.pyro.df <- tpm.colon.df %>% dplyr::filter(ensembl %in% pyroptosis.genes$ensembl_gene)
tpm.colon.pyro.df2 <- left_join(tpm.colon.pyro.df, pyroptosis.genes %>% dplyr::select(ensembl_gene, gene_symbol), by = c("ensembl" = "ensembl_gene"))
tpm.colon.pyro.df$gene_symbol <- tpm.colon.pyro.df2$gene_symbol
tpm.colon.pyro.df
Compute the median TPM per gene.
gene.expressions.pyro <- data.frame(
row.names = tpm.colon.pyro.df$gene_symbol,
tissue = "Colon",
tpm = matrixStats::rowMedians(as.matrix(tpm.colon.pyro.df %>% dplyr::select(-c("ensembl", "gene_symbol"))))
)
gene.expressions.pyro
Plot the gene expression.
ggplot(gene.expressions.pyro, aes(y = tissue, x = rownames(gene.expressions.pyro), fill = tpm)) +
geom_tile() +
scale_fill_gradient(low = "white", high = "red") +
theme_minimal() +
theme(
axis.text.x = element_text(size = 7, angle = 90, hjust = 1)
) +
labs(
title = "Expression of pyroptosis-related genes in GTEx colon tissues",
x = "Gene",
y = "Area",
fill = "TPM"
)
Sanity Check: CHMP4B and pyroptosis - https://pubmed.ncbi.nlm.nih.gov/38823000/
De La Salle University, Manila, Philippines, kim_leejra@dlsu.edu.ph↩︎
De La Salle University, Manila, Philippines, gonzales.markedward@gmail.com↩︎
De La Salle University, Manila, Philippines, anish.shrestha@dlsu.edu.ph↩︎