Importing Data into R

Usually, people start with data in different file types. In R, file types tend to require their own importing functions. In this post, I will cover how to import data into R from the most common file types.

Ellena Girma https://rforplantbreeders.netlify.app/about.html
2022-05-11

Usually, people start with data in different file types. This can make importing data feel overwhelming since different file types tend to require their own function. In this blog, I’ll cover how to import data into R from the most common file types.

Some things to take care of before we start importing

Data

If you have locally saved data ready to be imported, make sure you know the folder’s location. If you don’t have access to data right now but still want to practice, I suggest using data from this paper. 1 The data contains information about the seed yield, 100 seed weight, number of seeds/pod, number of pods/plant, and days to 75% flowering for 99 common bean variants grown in 3 locations. The data is listed in Appendix A. Supplementary data.

R Workspace

Make sure you clear your RStudio environment, or you have opened a new session. To open a new session, go to the Session tab and choose New Session. If you want to clear your workspace without opening a new session, go to the Environment tab on the top-right window. Then, click the broom icon to clear your environment.

Working Directory

Your working directory is where R will look for files you want to import. I suggest setting your working directory to the folder that has your data. Set your working directory using the function setwd(“localfilepath”). Setting your working directory is also useful when you want to save files in a specific folder.

R can also import files from folders that have not been set as the working directory. To do this, specify the folder’s file path in the importing function.

Other Things to Remember!

Structure and Format of Your Data

To import data into R, it’s important to know what file type you are starting with. Once you know what kind of data you’re starting with, you can install and load the corresponding package. The most common data file types are excel files, CSV files, and TXT files. The data you want to work with might also be on the web.

Excel files

For excel files, you can use the XLConnect package or package the readxl .

XLConnect

XLConnect lets you work on excel through R’s interface. Using XLConnect, you can create or load an existing workbook. Then, you can read data from one of the excel sheets or input data from R into a worksheet. Below, I show how to load in an existing workbook, get the names of the sheets in the workbook, then read data from a worksheet in the workbook.

#load the package into R session
library(XLConnect)

#setting file path to an object
file_path_xl <- "C:/Users/eruph/Documents/TB/Importing_Blog/workbook.xlsx"

#create connection to an existing excel file to R using file path object
wb <- loadWorkbook(file_path_xl, create = FALSE) #set create = TRUE if you want to create a new file

#get names of all worksheets in your workbook
getSheets(wb)
[1] "Sheet1" "Sheet2"
#reading data from worksheet 2 specifying start & end of cols & rows
wb_2 <- readWorksheet(wb, sheet = "Sheet2", startRow = 3, startCol = 1, endRow = 15, endCol = 3)

readxl

The readxl package can be used to load data from .xls and .xlsx formats. Below I load and xls file and an xlsx file using the read_excel() and real_xlsx() functions.

#load the package into R session
library(readxl)

#setting file path
file_path_xls <- "C:/Users/eruph/Documents/TB/Importing_Blog/read.xls"

#importing .xls or .xlsx into R
xls_data <- read_excel(path = file_path_xls, sheet = 1, col_names = TRUE)

#list sheet names in the excel file
excel_sheets(file_path_xls)
[1] "Sheet1" "Sheet2" "Sheet3"
#setting file path
file_path_xlsx <- "C:/Users/eruph/Documents/TB/Importing_Blog/read2.xlsx"

#reading data from specific cells (example, cells F3 to H12)
data <- read_xlsx(file_path_xlsx, sheet = 3, range = 'F3:H12')

CSV and TXT files

CSV files separate information using commas or semi-colons, while TXT files use tabs to separate information. I have grouped CSV and TXT files together because they can be imported using either the utils package or the readr package.

utils package

CSV files using utils

For CSV files, check if your file is separated by commas or semicolons. Use read.csv() if your file is comma-separated and read.csv2() if it is semi semicolon-separated.

#setting file path
file_path_csv <- "C:/Users/eruph/Documents/TB/Importing_Blog/csvfile.csv"

#reading data while specifying the the data is comma separated and that the first row has header names
csv_df <- read.csv(file_path_csv, header = TRUE, sep = ",")

TXT files using utils

To read TXT files using utils, you can use the read.table() or the read.delim() functions.

#setting file path
file_path_txt <- "C:/Users/eruph/Documents/TB/Importing_Blog/environment_txt.txt"

#importing file with the first row as variable names and add blank field/s to rows with missing values
environment_txt <- read.table(file_path_txt, header = TRUE, sep = "", fill = T)

read.table() and read.delim() from the utils package can be used for other file types. As shown above, I used read.table() for importing a text file. However, it can also be used to import CSV files. When using these two functions, look at the sep (separator) argument. Assign the character used to separate information in the document type you are importing to the sep argument.

R documentation is a great resource to learn more about more about utils functions. You can also run ?function_name() to get more information about a function within RStudio.

readr package

The readr package loads data faster than the utils package. All of readr’s import functions start with read_ and are followed by the file format type. Here are some examples:

Check out the readr page to get a more detailed look at the package.

Web Sources

Sometimes, the data you need will be on the web. The readr and utlis packages can be used to get CSV, TXT and excel format data from the web.

CSV and TXT files from web using readr package

To import CSV and txt files using the readr package, pass the web link into either the read_csv() or the read_tsv() function. Take a look at the example code below.

#assign the link to an object
url_csv <- "http://link.csv"

#call object with link to import the data
#data_web <- read_csv(url_csv)

#assign the link to an object
url_delim <- "http://link.txt"

#call object with link to import the data
#data_web <- read_tsv(url_delim)

excel files from web using utils package

To import excel files from the web, use the utils package. First download the file using download.file(), then read the file using read_excel().

#download.file(url, destfile = destination_path)
#web_excel <- read_excel(destination_path)

Incase this blog post did not cover the data source you used, please check out some of the resources I have listed below. Hopefully, one of them will cover your data source.

Visitors

Last updated on

[1] "2022-05-11 11:27:25 EAT"

  1. ‘’Philipo M, Ndakidemi PA, Mbega ER. Environmentally stable common bean genotypes for production in different agro-ecological zones of Tanzania. Heliyon. 2021 Jan 19;7(1):e05973. doi: 10.1016/j.heliyon.2021.e05973’’↩︎