In this section of R efficiency, we will go over how you can import and export your data, at lightning speeds, with just a few lines of code.
In the experiment below, I compare two different methods of importing and exporting data. The first method is the read and write functions native to R which are read.csv() and write.csv(). As you will discover from the conclusion of the experiment below, these functions are no match for the data.table() package.
The data.table() package’s fast and friendly file finagler aka fread() takes most of the thinking out of importing your data. Intelligently, fread() reads in the first few rows of your data, and detects the data type of that field for the rest of the data set. Assigning the correct datatype to the data will lead to faster importing of data.
The data.table() package also has a fwrite() function which writes data files very quickly compared to write.csv(). The native function will convert all of the data to a string before writing it to a file, which takes up more RAM and time.
We test the two methods of import and export on a common dataset used in R, iris. I have duplicated the same iris dataset multiple times to get a dataset with about 19 million rows and 5 columns, which is about 750 Mb of data. I run the same functions 10 times each on the same dataset, to find the mean duration of each function.
Statistics on data:
fread() is about 20x times faster then read.csv().
fwrite() is about 124x times faster then write.csv().
Read more about the data.table package here.