In this notebook, we will learn how to handle groups of data — or, in R terminology, groups of objects.
R provides several ways for grouping objects:
-
Vector: Collection of objects with the same data type
- Matrix: Table consisting of objects with the
same data type
- List: Collection of objects with
possibly different data types
A vector is a collection of objects with the same data type.
To create a vector, use the c()
function:
amino_acids <- c("methionine", "leucine", "alanine", "valine", "glutamine", "threonine")
Accessing items in a vector can be done like so:
amino_acids[1] # Access the first element
[1] "methionine"
amino_acids[2] # Access the second element
[1] "leucine"
We can also combine two vectors into a single vector using the
c()
function:
addl_amino_acids <- c("glycine", "cysteine", "serine")
many_amino_acids <- c(amino_acids, addl_amino_acids)
many_amino_acids
[1] "methionine" "leucine" "alanine" "valine" "glutamine" "threonine" "glycine" "cysteine" "serine"
A matrix is a table consisting of objects with the same data type.
To create a matrix, use the matrix()
function:
amino_acids_matrix <- matrix(c("methionine", "leucine", "alanine", "valine", "glutamine", "threonine"), nrow=3, ncol=2)
amino_acids_matrix
[,1] [,2]
[1,] "methionine" "valine"
[2,] "leucine" "glutamine"
[3,] "alanine" "threonine"
We can add rows and columns using rbind()
and
cbind()
, respectively:
amino_acids_matrix <- rbind(amino_acids_matrix, c("proline", "arginine"))
amino_acids_matrix
[,1] [,2]
[1,] "methionine" "valine"
[2,] "leucine" "glutamine"
[3,] "alanine" "threonine"
[4,] "proline" "arginine"
amino_acids_matrix <- cbind(amino_acids_matrix, c("histidine", "phenylalanine", "tryptophan", "selenocysteine"))
amino_acids_matrix
[,1] [,2] [,3]
[1,] "methionine" "valine" "histidine"
[2,] "leucine" "glutamine" "phenylalanine"
[3,] "alanine" "threonine" "tryptophan"
[4,] "proline" "arginine" "selenocysteine"
Accessing items in a matrix can be done like so:
amino_acids_matrix[1, 2] # Accesses the first row, second column
[1] "valine"
amino_acids_matrix[4, 3] # Accesses the fourth row, third column
[1] "selenocysteine"
We can also check if the items in a matrix satisfies a given
condition: - any()
checks if at least one of the items
satisfies the condition - all()
checks if all of the items
satisfy the condition
numbers <- matrix(c(2, 4, 6, 8, 10, 12, 14, 16, 18), nrow=3, ncol=3)
any(numbers < 6)
[1] TRUE
any(numbers < 1)
[1] FALSE
all(numbers < 20)
[1] TRUE
all(numbers < 5)
[1] FALSE
It is possible to name the rows and columns (also called the
dimensions) of a matrix using colnames()
and
rownames()
, respectively.
This can greatly aid in making our matrix more descriptive, especially when we perform some analysis.
colnames(numbers) <- c("Treatment 1", "Treatment 2", "Treatment 3")
rownames(numbers) <- c("Patient 1", "Patient 2", "Patient 3")
numbers
Treatment 1 Treatment 2 Treatment 3
Patient 1 2 8 14
Patient 2 4 10 16
Patient 3 6 12 18
Finally, we demonstrate some matrix operations:
numbers1 <- matrix(c(2, 4, 6, 8, 10, 12, 14, 16, 18), nrow=3, ncol=3)
numbers2 <- matrix(c(12, 14, 16, 18, 20, 22, 24, 26, 28), nrow=3, ncol=3)
numbers1 + numbers2
[,1] [,2] [,3]
[1,] 14 26 38
[2,] 18 30 42
[3,] 22 34 46
numbers1 - numbers2
[,1] [,2] [,3]
[1,] -10 -10 -10
[2,] -10 -10 -10
[3,] -10 -10 -10
numbers1 * numbers2
[,1] [,2] [,3]
[1,] 24 144 336
[2,] 56 200 416
[3,] 96 264 504
numbers1 / numbers2
[,1] [,2] [,3]
[1,] 0.1666667 0.4444444 0.5833333
[2,] 0.2857143 0.5000000 0.6153846
[3,] 0.3750000 0.5454545 0.6428571
100 * numbers1 # Scalar multiplication
[,1] [,2] [,3]
[1,] 200 800 1400
[2,] 400 1000 1600
[3,] 600 1200 1800
t(numbers1) # Matrix transpose
[,1] [,2] [,3]
[1,] 2 4 6
[2,] 8 10 12
[3,] 14 16 18
A list is a collection of objects with possibly different data types.
To create a list, use the list()
function:
assorted_list <- list("proline", "methionine", 1, 2)
assorted_list
[[1]]
[1] "proline"
[[2]]
[1] "methionine"
[[3]]
[1] 1
[[4]]
[1] 2
Accessing list items can be a bit tricky though.
Observe how assorted_list[1]
does not return the item
proline
per se. It actually returns a list containing
proline
.
assorted_list[1]
[[1]]
[1] "proline"
If we want the item proline
to be returned, we have to
use double brackets:
assorted_list[[1]]
[1] "proline"
If we want a summary of the list elements, we can use the
summary()
function.
summary(assorted_list)
Length Class Mode
[1,] 1 -none- character
[2,] 1 -none- character
[3,] 1 -none- numeric
[4,] 1 -none- numeric
De La Salle University, Manila, Philippines, gonzales.markedward@gmail.com↩︎