A set of packages developed by RStudio that facilitates:
Reading (readr) and manipulating databases (dplyr)
Exploiting textual data (stringr), temporal (lubridate) or categorical (forcats)
Creating graphics (ggplot2)
Programming from dataframes (purrr)
And many other things…
The tidy data concept
Each variable has its own column;
Each observation has its own row;
A value, materializing an observation of a variable, is found in a single cell.
Note
Concept popularized by Hadley Wickham.
readr
The package for reading flat files (.csv, .txt…)
Allows obtaining a tibble, the augmented dataframe of the tidyverse
dplyr
The central package of the data manipulation ecosystem;
Data manipulation and descriptive statistics;
dplyr
Main verbs
We work on a tibble (augmented dataframe)
select(): select variables by their name;
rename(): rename variables;
filter(): select observations according to one or more conditions;
arrange(): sort the table according to one or more variables;
mutate(): add variables that are functions of other variables;
summarise(): calculate a statistic from data;
group_by(): perform operations by group.
dplyr
Data manipulation
The following package(s) will be installed:
- eurostat [4.0.0]
These packages will be installed into "/__w/r-introduction/r-introduction/renv/library/linux-ubuntu-noble/R-4.5/x86_64-pc-linux-gnu".
# Installing packages --------------------------------------------------------
- Installing eurostat 4.0.0 ... OK [linked from cache]
Successfully installed 1 package in 5.9 milliseconds.