R is a great, open-source software for working with plant breeding data! Over the last year, I have been using R as my primary tool for data analysis. In this blog post, I give a quick introduction to R for plant breeders.
To use R, you should download R and RStudio.
I have some resources linked on my resources page that will help you get R and RStudio set up. You will also find some resources on getting yourself familiar with R Studio’s Interface.
When you are ready to start working on your data, you should set your working directory to a specific folder. It should be where you have your data saved. Setting your working directory to this folder will also allow you to easily save any output files (like plots) to the same folder.
#set working directory by adding file path like below
#setwd("C:/Users/eruph/Documents/Technical-Blog_EG/Technical-Blog/_posts/2022-03-18-introduction-to-r-for-plant-breeding")
#check working directory
getwd()
[1] "C:/Users/eruph/Documents/Technical Blog_EG/Technical-Blog/_posts/2022-03-18-introduction-to-r-for-plant-breeding"
All code lines that start with a hash/number(#) sign are comment lines. Comments help you and others who read your code understand what that code line or section is doing. If you put the # in front of a code line, that code will not run. Putting # in front of a code line is helpful when testing what different lines of code do.
Packages in R Studio are how you will access the functions you want to use/run. There are two main things you have to do when you want to use an R package:
There are two ways to install a package in R. The first way is by going to the ‘Packages’ tab on the bottom right window. The second way is to do it using code. Below is an example of installing and loading tidyverse, a collection of R packages.
#installing the tidyverse package
#install.packages("tidyverse")
# Loading tidverse for the session
#library(tidyverse)
Some of the packages included in core tidyverse are:
Learn more about the tidyverse here .
In R, you can run mathematical operations & functions, logarithms, exponentials, etc. Below are some examples. When you run each line of code, the answer will be shown in the console window (bottom left window) of RStudio’s interface.
12 + 2 #addition
[1] 14
12 - 2 #subtraction
[1] 10
12 * 2 #multiplication
[1] 24
12 / 2 #division
[1] 6
12 ^ 2 #exponentiation
[1] 144
log2(16) #log
[1] 4
sqrt(16) #square root
[1] 4
In R, you will often need to store values or an object because you want to use it in another function. This is called assigning. Assigning a value or object to a variable will save it in your environment to be used later.
#Assigning value 4 to x
x <- 4
#Assigning value 6 to y
y <- 6
#Assigning z and using x & y
z <- x + y
#Assigning an example dataset from datasets base r to a variable
plant_growth <- datasets::PlantGrowth
Use the function class() to check the data type of a saved object, column, or columns in the entire dataset.
#calling first 6 rows of dataset loaded earlier
head(plant_growth)
weight group
1 4.17 ctrl
2 5.58 ctrl
3 5.18 ctrl
4 6.11 ctrl
5 4.50 ctrl
6 4.61 ctrl
#class of dataset
class(plant_growth)
[1] "data.frame"
#class of specific column in dataset called group
class(plant_growth$group)
[1] "factor"
Use the function str() to check the data structure of an object.
chr [1:7] "a" "b" "c" "d" "e" "f" "g"
num [1:2, 1:3] 5 6 7 8 9 10
Factor w/ 3 levels "blue","red","yellow": 2 3 1
#data.frame
data_frame_1 <- data.frame (
Variety = c("SCN 11", "Embean 11", "Jesca"),
Yield = c(1200, 1550, 1320),
Seed_weight = c(60, 30, 45)
)
str(data_frame_1)
'data.frame': 3 obs. of 3 variables:
$ Variety : chr "SCN 11" "Embean 11" "Jesca"
$ Yield : num 1200 1550 1320
$ Seed_weight: num 60 30 45
List of 3
$ : chr "Arusha"
$ : chr "Lyamungu"
$ : chr "Uyole"
R is a great tool that more plant breeders can use to conduct their analysis. R can be used to explore your data before performing more specific analysis. You can easily get descriptive statistics or make exploratory plots to show data distribute or variance. You can also conduct Analysis of Variance (ANOVA), Principal Component Analysis (PCA), and create AMMI Biplots & GGE Biplots.