In this notebook, we will learn how to perform statistical inference in R.

Preliminaries

We start by importing tidyverse.

library("tidyverse")
Warning: package ‘tidyverse’ was built under R version 4.4.2── Attaching core tidyverse packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     ── Conflicts ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors

We load our dataset.

data <- read_delim("phages.tsv")
Rows: 18406 Columns: 27── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (19): Accession, Description, Classification, Molecule, Modification Date, Low Coding Capacity Warning, Host, Lowest Taxa, Genus, Sub-family, Family, Order, Class, ...
dbl  (7): Genome Length (bp), molGC (%), Number CDS, Positive Strand (%), Negative Strand (%), Coding Capacity(%), tRNAs
lgl  (1): Jumbophage
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

One-sample t-test

We take a look at the unique hosts first.

unique(data %>% pull(Host))
  [1] "Unspecified"           "Prevotella"            "StxXX"                 "Klebsiella"            "Pseudomonas"           "Campylobacter"        
  [7] "Escherichia"           "Clostridioides"        "Salmonella"            "Plectonema"            "Vibrio"                "Stenotrophomonas"     
 [13] "Pantoea"               "Enterobacter"          "Haloferax"             "Staphylococcus"        "Corynebacterium"       "Gordonia"             
 [19] "Microbacterium"        "Mycobacterium"         "Arthrobacter"          "Dipodfec"              "Peromfec"              "Sigmofec"             
 [25] "Riemerella"            "Streptomyces"          "Pectobacterium"        "Enterococcus"          "Natrinema"             "Acinetobacter"        
 [31] "Shigella"              "Rhodobacter"           "Clostridium"           "Proteus"               "Cronobacter"           "Microcystis"          
 [37] "Verrucomicrobia"       "Xanthomonas"           "Bacillus"              "Actinomyces"           "Halomonas"             "Serratia"             
 [43] "Sulfolobus"            "Prochlorococcus"       "Huginn"                "Muninn"                "Caulobacter"           "Dickeya"              
 [49] "Synechococcus"         "Erwinia"               "Aeromonas"             "Cedecea"               "Mycolicibacterium"     "Fusobacterium"        
 [55] "Janthinobacterium"     "Eggerthella"           "Bifidobacterium"       "Lactobacillus"         "Brevundimonas"         "Cytobacillus"         
 [61] "Tetragenococcus"       "Elizabethkingia"       "Bacteroides"           "Luteibacter"           "Burkholderia"          "Pseudarthrobacter"    
 [67] "Brochothrix"           "Citromicrobium"        "Brevibacterium"        "Leuconostoc"           "Psychrobacter"         "Glutamicibacter"      
 [73] "Curtobacterium"        "Kurthia"               "Listeria"              "Carnobacterium"        "Liberibacter"          "Rhodococcus"          
 [79] "Octadecabacter"        "Paraglaciecola"        "Propionibacterium"     "Flavobacterium"        "Yersinia"              "Levilactobacillus"    
 [85] "Lactiplantibacillus"   "Chryseobacterium"      "Roseburia"             "Sphaerotilus"          "Lokiarchaeota"         "Raoultella"           
 [91] "Nostoc"                "Paracoccus"            "Citrobacter"           "Streptococcus"         "Haemophilus"           "Chlamydia"            
 [97] "Enterobacteria"        "Helicobacter"          "Pararheinheimera"      "Pseudaeromonas"        "Parageobacillus"       "Geobacillus"          
[103] "Edwardsiella"          "Halobacterium"         "Lentibacter"           "Ralstonia"             "Alteromonas"           "Halorubrum"           
[109] "Spiroplasma"           "Providencia"           "Haloarcula"            "Shewanella"            "Thermus"               "Lactococcus"          
[115] "Cylindrospermopsis"    "Rahnella"              "Lentisphaerae"         "Hydrogenophilus"       "Phormidium"            "Halogeometricum"      
[121] "Pasteurella"           "Morganella"            "Pseudoxanthomonas"     "Erythrobacter"         "Hafnia"                "Pelagibacter"         
[127] "Roseobacter"           "Mycoplasma"            "Desulfovibrio"         "Acidovorax"            "Hamiltonella"          "Pseudoalteromonas"    
[133] "Oceanospirillum"       "Alcaligenes"           "Paenibacillus"         "Kosakonia"             "Komagataeibacter"      "Acetobacter"          
[139] "Achromobacter"         "Bordetella"            "Altiarchaeum"          "Myxococcus"            "Akkermansia"           "Ruminococcus"         
[145] "Psychrobacillus"       "Dinoroseobacter"       "Rhizobium"             "Polaribacter"          "Solobacterium"         "Stappia"              
[151] "Methanoculleus"        "Wolbachia"             "Winogradskyella"       "Tenacibaculum"         "Olleya"                "Cellulophaga"         
[157] "Maribacter"            "Anabaena"              "Sphingopyxis"          "Nocardia"              "Agrobacterium"         "Thermocrinis"         
[163] "Cellulosimicrobium"    "Arcanobacterium"       "Gardnerella"           "Parabacteroides"       "Nitratiruptor"         "Rheinheimera"         
[169] "Cutibacterium"         "Protaetiibacter"       "Arthronema"            "Methanocaldococcus"    "Ruegeria"              "Haloterrigena"        
[175] "Sphingomonas"          "Buttiauxella"          "Oenococcus"            "Micromonospora"        "Sulfitobacter"         "Pseudanabaena"        
[181] "Pyrobaculum"           "Metallosphaera"        "Acidianus"             "Saccharolobus"         "Aminobacter"           "Phaeobacter"          
[187] "Butyrivibrio"          "Mesorhizobium"         "Xylella"               "Thermoproteus"         "Ochrobactrum"          "Sporosarcina"         
[193] "Brucella"              "Sinorhizobium"         "Salinibacter"          "Pediococcus"           "Methanobacterium"      "Photobacterium"       
[199] "Mannheimia"            "Methanosarcina"        "Aeribacillus"          "Rothia"                "Curvibacter"           "Aerococcus"           
[205] "Nitrosopumilus"        "Delftia"               "Virgibacillus"         "Nodularia"             "Exiguobacterium"       "Natrialba"            
[211] "Meiothermus"           "Acidithiobacillus"     "Thermobifida"          "Marinobacter"          "Weissella"             "Azobacteroides"       
[217] "Mastigocladus"         "Leptospira"            "Marinomonas"           "Anoxybacillus"         "Faecalibacterium"      "Pontimonas"           
[223] "Gluconobacter"         "Erysipelothrix"        "Aeropyrum"             "Plesiomonas"           "Aphanizomenon"         "Lysinibacillus"       
[229] "Caldibacillus"         "Leclercia"             "Sphingobium"           "Rathayibacter"         "Peptoclostridium"      "Xenohaliotis"         
[235] "Bradyrhizobium"        "Aquamicrobium"         "Salinivibrio"          "Pelagibaca"            "Thiobacimonas"         "Rhodovulum"           
[241] "Croceibacter"          "Tsukamurella"          "Marinitoga"            "Acaryochloris"         "Rhodoferax"            "Moraxella"            
[247] "Brevibacillus"         "Skermania"             "Hydrogenobaculum"      "Lelliottia"            "Aurantimonas"          "Roseovarius"          
[253] "Thermoanaerobacterium" "Clavibacter"           "Stygiolobus"           "Nitrincola"            "Aggregatibacter"       "Colwellia"            
[259] "Puniceispirillum"      "Halogranum"            "Loktanella"            "Tetraselmis"           "Dunaliella"            "Tetrasphaera"         
[265] "Saccharomonospora"     "Nonlabens"             "Bdellovibrio"          "Celeribacter"          "Iodobacter"            "Salisaeta"            
[271] "Thermococcus"          "Planktothrix"          "Nitrososphaera"        "Azospirillum"          "Methanothermobacter"   "Thalassomonas"        
[277] "Sodalis"               "Silicibacter"          "Listonella"            "Actinoplanes"          "Kluyvera"              "Pyrococcus"           

We get the guanine-cytosine (GC) content of the phages that target Escherichia.

gc_escherichia <- data %>% 
  filter(Host == "Escherichia") %>% 
  pull(`molGC (%)`)
gc_escherichia
   [1] 43.029 39.517 42.954 43.845 37.465 35.432 54.615 45.551 45.213 43.799 52.859 48.323 49.842 51.017 35.342 43.501 50.590 37.354 37.973 42.992 33.717 35.425 35.330
  [24] 50.864 49.026 44.549 46.600 52.732 46.633 50.010 54.528 45.890 39.403 37.423 47.008 35.357 50.951 42.014 50.025 54.324 54.517 35.597 47.016 41.347 40.528 39.006
  [47] 46.997 46.169 49.644 50.971 48.296 47.043 45.556 52.034 43.575 43.655 40.519 35.215 48.792 48.792 54.572 40.180 45.003 44.979 50.054 50.074 46.633 45.009 45.091
  [70] 35.347 38.971 46.096 54.628 39.002 39.006 39.007 39.011 48.178 46.036 47.098 35.376 35.374 51.104 35.425 38.986 39.004 43.875 44.179 44.553 44.553 44.940 49.489
  [93] 56.644 49.034 50.221 50.154 49.748 54.215 47.231 46.061 47.523 46.369 55.595 50.725 54.965 47.286 50.235 50.201 50.853 48.693 49.657 52.125 48.049 51.260 50.733
 [116] 50.618 51.720 56.382 49.570 47.376 46.715 49.661 42.825 46.423 49.229 47.417 50.091 47.447 50.167 49.786 49.782 47.712 50.738 49.227 53.424 38.932 45.787 45.847
 [139] 49.792 38.926 35.279 39.047 38.892 39.028 37.854 38.986 39.186 38.912 37.410 38.940 39.028 39.136 37.411 37.533 43.759 48.874 47.484 39.956 46.068 43.741 46.638
 [162] 35.368 49.743 39.021 38.845 38.793 51.405 50.502 39.609 35.270 42.055 44.093 48.988 50.999 50.080 50.286 49.785 50.883 50.352 50.798 50.054 50.199 35.323 35.430
 [185] 39.137 37.773 35.552 51.888 51.903 49.840 52.106 49.776 51.384 50.016 50.007 51.951 49.711 37.621 35.513 39.504 39.504 35.500 45.889 35.359 35.281 47.184 37.652
 [208] 39.245 48.394 48.434 48.460 48.511 48.450 48.471 35.490 48.577 48.469 48.589 48.527 48.529 48.534 48.513 48.544 48.561 48.534 48.507 48.504 48.570 48.531 48.499
 [231] 48.446 48.552 48.489 48.557 48.494 48.546 48.489 48.506 48.481 48.468 48.481 48.498 48.481 48.489 48.485 48.498 48.478 48.480 48.501 48.489 48.489 48.448 48.486
 [254] 48.476 48.487 48.445 48.504 48.514 48.501 48.461 48.452 48.432 48.439 48.454 48.456 48.414 48.427 48.425 48.451 48.405 48.434 48.434 48.434 48.428 48.423 48.400
 [277] 48.395 38.933 49.958 37.608 54.686 54.706 54.701 39.592 40.232 39.591 40.155 39.069 44.128 35.436 45.333 35.458 35.641 45.529 45.553 37.638 37.638 44.098 44.058
 [300] 44.236 39.885 52.055 35.398 39.014 40.287 38.996 35.451 48.831 40.425 54.412 55.183 44.683 44.689 44.686 44.635 44.941 39.004 43.950 50.230 43.542 41.360 44.638
 [323] 41.338 54.471 35.381 54.512 38.947 37.476 50.655 39.043 44.732 54.431 35.494 45.010 43.676 44.744 39.062 44.404 44.569 38.824 35.525 54.427 54.531 35.419 38.968
 [346] 44.744 44.539 42.829 54.493 54.571 54.604 42.868 44.107 44.545 44.601 37.458 40.147 35.324 35.343 50.573 50.368 41.983 48.667 35.425 36.764 35.437 44.299 43.561
 [369] 44.207 39.719 40.683 54.436 35.428 40.054 44.654 38.821 43.664 44.716 37.501 46.386 35.645 54.382 35.412 54.519 43.636 43.522 45.848 44.655 43.956 35.373 54.610
 [392] 44.665 39.029 48.579 37.436 43.634 43.687 48.492 44.531 43.934 48.711 48.212 40.145 39.027 39.120 43.613 40.189 44.692 54.647 44.775 43.500 35.419 35.447 35.492
 [415] 48.956 37.433 43.684 43.695 43.448 43.159 38.969 43.478 37.693 43.703 48.111 35.395 44.755 35.457 41.327 43.762 35.454 37.672 35.334 35.368 45.076 45.079 49.874
 [438] 49.877 49.879 49.879 35.341 35.364 35.367 35.373 35.370 35.448 35.369 35.370 35.444 35.370 44.553 44.989 35.452 35.362 42.155 35.369 37.661 50.154 49.926 49.933
 [461] 50.068 44.996 44.985 35.413 42.018 54.363 54.491 35.222 45.066 49.827 50.390 50.211 54.506 39.535 40.244 39.109 39.901 39.270 42.680 40.886 38.983 37.524 45.176
 [484] 39.678 44.257 44.499 42.274 41.304 44.001 43.199 45.552 47.311 49.786 49.483 51.172 44.764 50.958 44.592 49.938 42.103 45.076 45.070 44.609 44.944 39.459 51.114
 [507] 39.052 35.336 39.024 35.373 45.945 50.892 35.549 37.678 50.422 35.424 47.087 41.780 42.364 45.506 44.197 50.627 37.666 39.101 46.339 49.853 50.019 35.483 35.356
 [530] 35.444 50.887 38.943 38.783 37.418 40.326 47.449 52.059 35.529 35.419 42.953 53.680 35.445 35.373 54.526 35.606 43.468 54.478 42.824 35.427 35.360 35.485 42.054
 [553] 37.606 42.849 48.473 48.473 48.476 35.458 37.680 40.262 46.983 44.903 39.569 49.051 42.088 54.714 50.046 40.313 41.224 44.557 48.438 35.319 43.606 46.254 46.021
 [576] 45.060 42.153 39.438 42.159 40.423 35.417 37.653 38.725 42.919 35.290 41.287 39.059 54.481 59.447 37.637 54.531 43.650 39.116 39.004 43.573 37.448 38.914 43.582
 [599] 38.940 38.985 39.043 39.075 44.768 39.035 37.511 43.699 54.501 43.687 43.705 55.410 55.414 37.681 39.465 38.935 39.105 55.387 49.749 40.470 38.924 43.644 39.030
 [622] 39.212 39.392 55.775 44.718 38.968 44.573 39.033 37.394 47.141 39.056 43.644 43.585 43.593 55.726 37.612 54.598 39.995 44.596 38.885 37.383 55.410 44.658 38.985
 [645] 37.686 43.653 39.042 43.637 54.197 40.499 37.365 38.877 55.816 55.717 38.944 39.025 39.081 35.467 35.449 47.210 46.930 59.973 46.815 54.633 44.450 47.049 45.718
 [668] 55.820 53.666 44.631 44.637 44.088 44.799 59.011 44.599 44.788 44.128 44.145 44.038 44.472 44.203 43.890 59.322 44.125 44.130 44.099 35.438 35.359 35.327 44.559
 [691] 50.829 37.669 35.366 43.692 40.429 39.106 41.287 49.203 42.101 47.054 43.538 40.475 45.940 35.311 54.485 54.484 54.485 54.488 43.685 43.684 43.685 43.685 54.485
 [714] 54.484 54.485 54.488 43.685 43.684 43.685 43.685 35.402 35.402 50.170 44.972 45.067 48.591 52.378 35.480 50.052 39.165 39.962 35.430 37.556 50.139 47.469 42.087
 [737] 47.140 51.553 54.463 45.061 38.994 46.898 50.865 50.893 50.657 50.957 50.488 50.869 48.686 53.081 38.917 49.699 49.841 54.491 42.917 40.178 38.949 44.862 48.508
 [760] 45.259 40.505 40.439 50.018 49.978 43.961 43.683 45.529 35.373 35.404 35.355 39.434 35.320 35.579 43.181 50.073 44.091 43.258 35.291 39.025 35.402 39.066 50.016
 [783] 50.015 49.927 50.030 50.030 54.608 50.032 49.689 50.316 50.288 50.430 44.161 52.840 44.193 37.537 40.375 40.302 40.369 53.076 53.022 35.055 40.607 42.363 42.343
 [806] 40.459 40.258 42.195 46.015 42.269 45.951 40.340 40.259 54.442 37.597 54.447 38.875 35.465 42.121 44.195 43.850 43.980 47.249 43.877 44.135 46.619 43.930 44.035
 [829] 44.111 43.955 47.277 50.216 50.148 50.397 42.265 43.806 46.420 45.309 42.100 42.112 38.995 35.316 41.268 54.280 46.141 50.559 50.149 53.207 41.993 44.012 50.191
 [852] 50.100 49.996 51.456 45.072 43.614 42.464 44.575 35.425 46.874 36.020 48.387 53.446 50.786 55.023 52.441 52.651 52.005 50.646 50.509 37.324 46.181 39.003 51.643
 [875] 50.027 43.671 38.957 43.695 43.694 43.653 38.911 38.920 42.910 39.004 50.876 43.653 49.340 51.483 50.822 51.378 35.388 35.378 35.365 50.115 51.791 49.879 50.027
 [898] 49.770 50.130 50.326 49.769 50.026 50.188 38.857 40.602 40.678 44.043 38.755 48.384 48.421 48.633 48.633 48.410 49.064 48.113 40.482 44.756 45.505 54.667 54.774
 [921] 49.858 40.579 37.684 39.650 52.603 38.945 43.521 35.325 35.340 45.247 51.723 35.298 53.086 51.599 50.556 44.422 44.397 44.451 44.440 44.427 44.404 39.948 39.952
 [944] 39.950 39.961 39.952 44.395 44.438 44.402 44.390 44.399 44.402 44.427 44.451 44.422 44.458 44.413 44.440 44.431 44.438 44.413 44.424 44.440 44.395 44.422 44.386
 [967] 50.911 50.903 50.918 50.896 44.409 44.422 44.386 44.406 44.424 44.409 44.433 44.413 44.431 44.427 44.384 44.420 44.393 44.395 44.402 44.445 47.543 48.965 34.170
 [990] 43.565 49.869 49.953 50.087 54.461 44.171 48.205 35.497 44.616 35.462 46.897
 [ reached getOption("max.print") -- omitted 608 entries ]

We get the GC content of all the phages in our dataset.

gc_all <- data %>% 
  pull(`molGC (%)`)
gc_all
   [1] 42.110 39.553 26.393 32.225 31.472 32.225 24.326 55.114 29.181 25.796 26.012 26.022 26.020 26.029 26.053 26.019 26.032 25.986 26.031 25.968 25.878 25.880 25.879
  [24] 25.877 49.916 49.820 50.520 50.525 50.525 50.615 50.607 50.595 50.615 51.602 50.500 50.499 50.515 50.535 51.603 51.540 51.534 51.596 51.596 51.612 51.559 51.613
  [47] 51.613 52.449 51.500 51.553 51.522 51.464 51.515 49.518 49.506 49.503 50.574 50.711 50.754 50.677 50.738 50.733 50.732 50.734 50.730 50.733 50.734 50.734 50.734
  [70] 50.740 50.734 50.703 50.735 50.725 50.728 50.726 50.698 50.700 50.718 50.722 50.724 50.703 50.728 50.720 50.723 50.722 50.717 50.711 50.719 48.909 48.908 36.831
  [93] 62.301 36.892 33.187 33.315 36.908 36.832 28.979 43.029 26.931 54.042 57.005 51.903 51.779 51.781 53.988 53.809 47.739 48.018 49.559 53.619 53.575 53.279 54.077
 [116] 51.811 53.157 44.044 50.057 43.998 54.122 56.279 45.548 55.452 47.406 54.060 53.118 50.094 54.073 54.231 47.800 47.999 44.687 50.022 56.338 53.192 54.101 56.906
 [139] 47.605 44.673 53.813 54.013 53.939 53.931 54.002 53.702 45.864 50.807 48.535 53.848 52.370 45.622 44.974 67.415 50.155 51.690 51.667 50.535 50.700 50.237 50.734
 [162] 50.579 50.581 50.582 50.581 50.583 59.334 35.687 30.019 60.029 67.716 58.164 66.763 65.928 66.155 66.700 67.030 58.257 51.738 58.339 67.350 39.517 62.337 36.365
 [185] 37.253 36.308 38.605 38.594 38.321 39.197 49.731 38.933 35.101 49.871 46.829 43.366 34.103 40.860 43.298 35.679 40.024 49.316 40.530 44.995 48.701 40.915 37.773
 [208] 40.717 46.312 55.361 41.566 42.303 35.273 37.626 47.791 43.972 36.378 43.419 45.353 53.161 38.532 38.442 43.501 38.447 32.001 46.700 39.583 40.962 34.181 44.346
 [231] 38.665 38.358 42.232 40.576 36.006 39.400 39.529 37.373 46.386 39.875 46.379 35.529 46.107 48.339 47.028 47.805 47.432 37.634 37.241 36.102 31.904 39.310 35.132
 [254] 45.092 42.737 43.977 35.238 35.533 41.001 44.123 31.320 46.112 43.833 39.926 41.257 45.431 36.830 45.332 36.071 42.613 39.359 39.822 44.920 37.289 39.190 36.022
 [277] 35.913 43.147 38.272 40.441 41.924 37.969 38.899 56.959 33.201 43.425 37.566 41.162 37.879 37.442 40.285 34.139 34.211 36.470 38.265 34.390 32.695 40.526 33.179
 [300] 35.597 32.763 42.860 46.093 45.847 46.729 48.281 36.139 40.778 35.435 44.221 42.606 32.120 40.725 49.594 39.978 38.644 47.107 43.120 43.540 38.937 49.825 33.641
 [323] 50.929 35.374 38.185 43.828 43.314 40.854 40.953 45.422 45.891 33.195 46.867 43.972 37.223 44.694 41.823 46.034 47.870 47.023 31.608 41.642 40.863 46.459 37.439
 [346] 41.547 37.254 39.246 48.933 39.785 35.399 40.053 40.532 37.309 33.041 46.950 43.118 45.094 42.487 42.093 42.436 39.029 46.384 36.178 40.587 41.927 40.808 35.813
 [369] 42.205 34.033 40.333 43.624 32.945 29.872 54.818 45.206 58.162 55.147 55.754 55.608 52.825 62.202 55.657 55.635 62.228 44.644 57.235 55.730 55.659 62.215 52.575
 [392] 62.181 55.547 48.290 42.954 43.845 34.810 60.852 60.857 63.592 64.237 51.634 62.677 59.199 67.929 66.567 60.877 67.062 56.139 63.894 69.126 68.895 57.669 61.246
 [415] 63.327 50.436 59.144 63.519 63.395 68.426 61.680 67.804 63.557 63.020 60.499 62.744 62.935 59.034 61.422 66.452 68.934 69.318 49.365 49.317 68.747 59.191 68.871
 [438] 63.708 61.257 60.199 66.987 59.140 63.525 65.415 49.308 67.589 66.379 61.845 68.884 67.139 68.607 51.793 63.756 63.539 59.638 70.655 67.349 61.231 63.890 50.137
 [461] 68.935 61.345 64.959 52.415 52.339 36.921 35.908 67.000 67.666 64.676 61.631 63.676 59.614 68.903 66.909 66.885 64.825 54.961 62.805 67.027 66.966 63.920 39.704
 [484] 59.134 40.033 66.971 62.664 62.598 59.080 67.111 66.601 66.785 67.419 53.631 67.611 66.718 64.693 57.654 68.716 67.681 57.617 60.229 62.689 68.490 59.308 68.634
 [507] 37.465 34.103 34.591 44.424 44.366 44.470 35.355 33.411 37.019 36.783 33.015 29.915 29.834 44.908 39.348 44.103 51.754 33.464 33.414 58.909 56.907 66.592 66.466
 [530] 64.724 63.450 64.700 64.633 49.275 61.917 60.268 63.663 30.161 35.589 61.865 35.432 34.655 54.615 28.166 45.551 39.290 54.693 46.871 43.370 44.825 46.012 52.498
 [553] 45.213 43.799 52.859 50.051 49.165 34.820 67.046 50.437 50.220 50.700 53.747 50.456 52.979 45.402 40.904 45.438 41.724 68.747 68.542 68.852 68.490 68.803 68.723
 [576] 68.563 68.747 68.746 68.749 68.512 68.559 68.815 69.109 68.528 68.974 68.822 68.642 68.791 68.989 68.737 68.735 68.940 68.749 68.848 68.759 68.759 68.759 68.760
 [599] 68.509 68.727 68.185 67.032 68.765 68.736 68.598 68.652 68.733 68.727 68.709 68.517 68.590 68.982 68.512 68.731 66.985 68.154 68.636 68.536 68.769 68.529 68.753
 [622] 68.220 48.291 50.967 35.209 62.960 33.449 34.831 49.692 46.079 48.323 32.124 34.445 34.356 33.356 34.490 34.601 34.020 34.250 60.434 60.179 64.155 65.309 60.125
 [645] 60.184 55.209 57.825 64.158 60.163 64.021 60.182 60.045 64.193 57.680 60.029 64.197 60.058 60.067 60.227 61.193 60.178 59.865 60.161 39.446 36.858 60.990 60.344
 [668] 49.842 51.017 35.342 43.501 50.590 37.354 47.589 65.758 34.808 71.802 62.796 65.522 66.626 68.500 67.987 68.506 68.936 47.069 49.960 66.604 36.161 37.559 39.745
 [691] 38.761 33.962 41.865 41.837 50.805 47.928 49.713 25.150 25.137 25.205 53.088 50.101 30.382 68.009 54.083 37.973 42.992 43.487 34.213 38.681 38.693 38.701 38.709
 [714] 38.756 38.713 38.715 38.719 38.723 38.699 38.682 38.714 38.684 38.756 38.773 38.748 38.648 39.928 35.993 46.670 35.427 48.281 50.880 64.192 64.065 64.181 64.172
 [737] 64.174 51.340 33.717 38.489 35.425 51.913 38.602 67.418 43.743 52.254 35.330 56.095 50.764 53.416 52.479 48.371 64.754 50.864 48.911 49.026 49.297 31.107 30.073
 [760] 30.265 30.068 31.513 42.599 37.616 42.567 42.521 42.140 44.668 44.824 44.886 45.872 45.515 44.811 48.865 42.441 44.990 43.307 43.121 42.408 42.561 48.544 42.914
 [783] 43.132 43.139 42.518 45.660 37.596 37.153 45.400 45.482 43.135 43.135 43.139 37.661 42.883 37.658 45.318 42.696 37.031 43.137 43.266 45.801 45.361 45.680 45.520
 [806] 42.645 45.361 48.471 45.852 43.418 37.485 61.264 63.437 66.545 67.649 66.597 68.852 68.303 67.613 66.579 66.432 60.027 63.064 64.770 47.104 66.480 71.183 50.071
 [829] 66.551 68.409 67.710 68.317 49.957 62.909 65.393 66.483 59.061 66.381 50.539 53.920 44.549 40.942 60.927 60.706 60.314 60.368 60.194 67.752 46.600 52.732 51.300
 [852] 68.984 67.283 66.617 63.752 42.423 42.418 45.301 46.633 50.010 52.423 36.192 61.644 41.629 60.370 49.902 50.070 65.527 65.801 65.872 63.085 61.922 38.945 48.176
 [875] 50.349 36.057 55.901 29.885 41.252 42.552 32.916 35.019 49.627 51.256 35.481 50.180 63.425 65.862 52.575 54.528 45.890 39.403 50.384 50.943 51.551 50.868 50.699
 [898] 50.786 51.192 44.384 35.182 38.712 45.521 37.423 47.008 40.014 45.798 45.539 41.859 36.663 36.652 38.605 37.730 40.467 33.219 39.334 39.917 39.422 61.152 62.512
 [921] 59.068 59.303 61.805 57.680 65.541 62.538 62.847 62.024 65.543 65.528 35.357 50.951 42.014 64.606 50.025 49.931 49.924 49.922 64.445 45.829 63.445 55.716 55.731
 [944] 55.501 34.953 39.590 36.896 35.338 54.324 54.517 39.808 49.158 39.848 55.462 67.509 41.047 41.646 33.258 61.641 35.597 62.919 47.016 41.347 40.528 39.006 46.997
 [967] 55.546 36.827 34.824 59.505 49.761 46.169 49.644 50.971 48.296 47.043 30.067 65.941 56.996 54.425 58.568 68.078 48.196 47.879 45.556 59.121 52.034 43.575 43.655
 [990] 35.805 40.519 35.215 31.108 31.864 65.687 66.637 68.599 64.736 63.324 64.928
 [ reached getOption("max.print") -- omitted 17406 entries ]

We compare the mean of the GC content of the phages that target Escherichia versus the population mean (i.e., the mean of the GC content of all the phages in our dataset) using t.test():

t.test(gc_escherichia, mu=mean(gc_all))

    One Sample t-test

data:  gc_escherichia
t = -28.68, df = 1607, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 48.42675
95 percent confidence interval:
 44.23457 44.77127
sample estimates:
mean of x 
 44.50292 

Two-sample t-test

We get the GC content of the phages that target Staphylococcus.

gc_staphylococcus <- data %>% 
  filter(Host == "Staphylococcus") %>% 
  pull(`molGC (%)`)
gc_staphylococcus
  [1] 35.687 30.019 34.103 34.591 35.355 33.411 29.915 29.834 33.464 33.414 30.161 34.655 33.449 34.601 34.020 34.250 30.382 36.192 29.885 35.019 34.953 35.338 33.258
 [24] 30.067 33.312 33.463 34.847 30.709 29.136 30.403 35.375 35.724 35.239 35.008 35.991 35.114 33.459 30.709 30.709 35.155 28.139 36.327 36.229 30.386 35.832 37.225
 [47] 35.375 36.027 34.925 34.947 35.563 30.350 30.668 45.827 34.046 34.987 30.386 30.038 33.480 34.135 33.965 30.703 32.992 34.269 33.418 33.615 32.957 34.269 33.418
 [70] 32.722 33.931 33.704 34.560 34.950 33.267 34.273 34.755 34.726 34.016 30.579 33.748 29.576 34.648 35.259 35.091 35.094 34.070 33.910 35.351 34.257 33.274 34.080
 [93] 33.544 34.112 34.745 34.063 25.090 31.001 33.246 34.908 34.675 35.565 34.738 34.586 34.156 34.103 36.020 33.347 30.261 30.388 35.958 35.722 35.470 35.534 33.730
[116] 34.587 34.077 34.302 34.250 32.871 29.427 30.209 30.422 30.377 30.771 34.368 34.967 34.718 30.936 35.108 29.413 30.754 30.829 30.756 31.591 29.688 30.611 30.573
[139] 30.885 29.415 30.613 30.702 29.662 31.338 31.335 31.351 30.801 30.803 31.673 31.591 30.853 30.889 29.667 28.001 31.332 27.961 31.247 30.799 30.852 30.797 31.327
[162] 31.329 31.329 30.770 33.988 34.126 34.591 25.029 30.180 34.179 29.889 29.909 29.966 30.105 28.018 28.019 27.991 27.992 28.044 27.934 29.354 34.835 34.720 29.273
[185] 33.976 33.972 33.959 30.355 29.872 29.806 30.361 30.167 29.871 30.306 30.407 30.392 30.379 30.431 30.412 30.361 30.220 32.964 30.358 25.133 34.059 34.909 30.811
[208] 29.314 28.435 34.913 30.366 26.833 34.540 34.719 34.699 34.769 34.587 31.775 25.132 24.753 29.397 29.886 25.178 35.224 33.536 35.733 29.569 30.574 30.451 30.426
[231] 30.458 33.121 33.057 33.117 33.113 33.257 33.216 33.137 33.276 33.438 33.241 33.407 33.281 30.236 33.779 30.161 30.315 35.740 30.400 28.764 28.755 28.773 28.810
[254] 28.793 33.287 29.878 29.874 28.867 36.882 29.293 30.409 35.314 27.980 27.987 27.992 34.575 33.143 30.127 30.405 30.498 30.781 30.435 30.278 30.427 30.778 30.785
[277] 30.783 31.395 29.973 30.833 31.397 30.781 30.269 29.889 30.279 29.873 29.875 30.831 30.278 29.888 30.726 30.833 30.425 30.425 30.399 30.464 34.946 34.437 28.866
[300] 34.497 30.192 35.695 29.343 34.017 33.963 33.419 31.617 30.447 30.325 35.625 34.091 36.421 35.900 35.469 35.353 35.673 34.991 29.960 28.966 30.879 30.610 30.324
[323] 30.206 33.708 30.352 30.426 30.387 30.332 30.314 35.040 30.180 31.917 29.590 29.955 29.909 29.362 30.389 30.367 34.567 30.180 30.172 33.554 30.423 33.062 33.349
[346] 33.329 33.305 33.080 30.254 30.252 30.388 29.608 29.799 30.378 28.962 32.885 33.349 30.247 28.955 35.393 31.917 30.223 31.113 29.262 30.215 30.389 30.387 30.422
[369] 30.417 30.396 35.791 35.805 35.813 35.700 30.863 30.863 30.512 33.111 33.463 33.747 32.960 30.259 29.858 31.824 31.827 32.014 31.848 35.349 34.166 33.428 33.657
[392] 32.850 34.686 34.488 30.405 34.129 29.823 30.096 33.277 30.001 58.142 29.348 29.340 35.587 35.695 35.537 35.798 28.899 34.261 29.018 29.354 29.974 34.661 34.730
[415] 34.246 35.645 35.742 33.290 30.534 33.716 35.553 34.829 30.326 30.323 32.902 35.411 30.309 30.219 33.016 30.328 30.328 29.588 27.960 30.425 33.982 35.102 29.568
[438] 33.370 30.385 30.217 29.296 30.327 35.135 33.084 34.464 33.404 35.977 30.243 30.392 30.393 30.412 30.416 27.930 30.310 33.320 34.539 34.446 34.542 34.267 29.973
[461] 35.419 33.341 35.599 33.191 30.011 33.044 30.475 30.421 30.371 34.171 34.067 33.216 30.228 35.635 33.709 33.787 34.036 33.415 29.215 28.908 33.309 33.311 30.422
[484] 33.676 28.944 32.993 29.325 33.993 33.568 35.060 36.916 29.287 31.417 30.602 33.555 33.476 33.040 33.454 33.751 33.503 33.469

We compare the means of the GC content of the phages that target Escherichia versus those that target Staphylococcus using t.test():

t.test(gc_escherichia, gc_staphylococcus)

    Welch Two Sample t-test

data:  gc_escherichia and gc_staphylococcus
t = 67.041, df = 1699, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 11.95993 12.68082
sample estimates:
mean of x mean of y 
 44.50292  32.18254 

  1. De La Salle University, Manila, Philippines, ↩︎

LS0tDQp0aXRsZTogIkluZmVyZW50aWFsIFN0YXRpc3RpY3MiDQphdXRob3I6IE1hcmsgRWR3YXJkIE0uIEdvbnphbGVzXltEZSBMYSBTYWxsZSBVbml2ZXJzaXR5LCBNYW5pbGEsIFBoaWxpcHBpbmVzLCBnb256YWxlcy5tYXJrZWR3YXJkQGdtYWlsLmNvbV0NCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KLS0tDQoNCkluIHRoaXMgbm90ZWJvb2ssIHdlIHdpbGwgbGVhcm4gaG93IHRvIHBlcmZvcm0gc3RhdGlzdGljYWwgaW5mZXJlbmNlIGluIFIuDQoNCiMjIFByZWxpbWluYXJpZXMNCg0KV2Ugc3RhcnQgYnkgaW1wb3J0aW5nIGB0aWR5dmVyc2VgLg0KDQpgYGB7cn0NCmxpYnJhcnkoInRpZHl2ZXJzZSIpDQpgYGANCg0KV2UgbG9hZCBvdXIgZGF0YXNldC4NCg0KYGBge3J9DQpkYXRhIDwtIHJlYWRfZGVsaW0oInBoYWdlcy50c3YiKQ0KYGBgDQoNCiMjIE9uZS1zYW1wbGUgdC10ZXN0DQoNCldlIHRha2UgYSBsb29rIGF0IHRoZSB1bmlxdWUgaG9zdHMgZmlyc3QuDQoNCmBgYHtyfQ0KdW5pcXVlKGRhdGEgJT4lIHB1bGwoSG9zdCkpDQpgYGANCg0KV2UgZ2V0IHRoZSBndWFuaW5lLWN5dG9zaW5lIChHQykgY29udGVudCBvZiB0aGUgcGhhZ2VzIHRoYXQgdGFyZ2V0IEVzY2hlcmljaGlhLg0KDQpgYGB7cn0NCmdjX2VzY2hlcmljaGlhIDwtIGRhdGEgJT4lIA0KICBmaWx0ZXIoSG9zdCA9PSAiRXNjaGVyaWNoaWEiKSAlPiUgDQogIHB1bGwoYG1vbEdDICglKWApDQpnY19lc2NoZXJpY2hpYQ0KYGBgDQoNCldlIGdldCB0aGUgR0MgY29udGVudCBvZiBhbGwgdGhlIHBoYWdlcyBpbiBvdXIgZGF0YXNldC4NCg0KYGBge3J9DQpnY19hbGwgPC0gZGF0YSAlPiUgDQogIHB1bGwoYG1vbEdDICglKWApDQpnY19hbGwNCmBgYA0KDQpXZSBjb21wYXJlIHRoZSBtZWFuIG9mIHRoZSBHQyBjb250ZW50IG9mIHRoZSBwaGFnZXMgdGhhdCB0YXJnZXQgRXNjaGVyaWNoaWEgdmVyc3VzIHRoZSBwb3B1bGF0aW9uIG1lYW4gKGkuZS4sIHRoZSBtZWFuIG9mIHRoZSBHQyBjb250ZW50IG9mIGFsbCB0aGUgcGhhZ2VzIGluIG91ciBkYXRhc2V0KSB1c2luZyBgdC50ZXN0KClgOg0KDQpgYGB7cn0NCnQudGVzdChnY19lc2NoZXJpY2hpYSwgbXU9bWVhbihnY19hbGwpKQ0KYGBgDQoNCiMjIFR3by1zYW1wbGUgdC10ZXN0DQoNCldlIGdldCB0aGUgR0MgY29udGVudCBvZiB0aGUgcGhhZ2VzIHRoYXQgdGFyZ2V0IFN0YXBoeWxvY29jY3VzLg0KDQpgYGB7cn0NCmdjX3N0YXBoeWxvY29jY3VzIDwtIGRhdGEgJT4lIA0KICBmaWx0ZXIoSG9zdCA9PSAiU3RhcGh5bG9jb2NjdXMiKSAlPiUgDQogIHB1bGwoYG1vbEdDICglKWApDQpnY19zdGFwaHlsb2NvY2N1cw0KYGBgDQoNCldlIGNvbXBhcmUgdGhlIG1lYW5zIG9mIHRoZSBHQyBjb250ZW50IG9mIHRoZSBwaGFnZXMgdGhhdCB0YXJnZXQgRXNjaGVyaWNoaWEgdmVyc3VzIHRob3NlIHRoYXQgdGFyZ2V0IFN0YXBoeWxvY29jY3VzIHVzaW5nIGB0LnRlc3QoKWA6DQoNCmBgYHtyfQ0KdC50ZXN0KGdjX2VzY2hlcmljaGlhLCBnY19zdGFwaHlsb2NvY2N1cykNCmBgYA==