Chapter 3 Transforming Data
{dplyr}
basics
The book R for Data Science is a very helpful reference guide. Chapter 5 covers many of the topics covered in this section, and may be useful as a resource later or to dive deeper into a topic.
Topics:
- Filter rows
- Comparisons
- Logical operators
- Missing values
- Arrange rows
- Select columns
- Add new variables
- Grouped summaries
- Combining multiple operations with the pipe
- Missing values
- Counts
- Useful summary functions
- Grouping by multiple variables
- Ungrouping
- Grouped mutates (and filters)
- Extra: Working with factor variables
3.1 An example
The following example ties several of the above ideas together. Imagine that we have a set of grades from a course that we would like to convert to letter grades using a particular weighting and letter-grade cut-offs.
<- "https://raw.githubusercontent.com/haleyjeppson/R4QM/main/data/CourseData.txt"
url <- readr::read_table(url)[, -1]
students students
## # A tibble: 70 × 6
## College Year Hmwk E1 E2 E3
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 NU 1 83.8 79 89 59
## 2 NU 1 77.1 60 97 63
## 3 NU 1 94.5 83 88 69
## 4 NU 1 84.3 91 77 72
## 5 NU 1 73.4 68 79 76
## 6 NU 1 94.7 89 96 79
## 7 NU 1 77.7 76 43 84
## 8 NU 1 92.9 92 81 84
## 9 NU 1 95.9 84 85 84
## 10 NU 1 98.2 99 84 85
## # … with 60 more rows
head(students)
## # A tibble: 6 × 6
## College Year Hmwk E1 E2 E3
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 NU 1 83.8 79 89 59
## 2 NU 1 77.1 60 97 63
## 3 NU 1 94.5 83 88 69
## 4 NU 1 84.3 91 77 72
## 5 NU 1 73.4 68 79 76
## 6 NU 1 94.7 89 96 79
tail(students)
## # A tibble: 6 × 6
## College Year Hmwk E1 E2 E3
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 MC 2 83.2 69 77 93
## 2 MC 2 95.7 88 89 92
## 3 MC 2 79.3 80 61 97
## 4 MC 2 78.1 69 89 97
## 5 MC 2 99.1 84 89 96
## 6 MC 2 95.8 96 100 100
In this particular case we want the weighting to be:
- 20% to Hmwk (col 3)
- 25% to E1 (col 4)
- 25% to E2 (col 5)
- 30% to E3 (col 6)
But those weights could change later. A function can be written to take the data, and the weights and calculate the weighted average. We can then add that score to the students data frame, and add a final column with the actual letter grades on a 90-80-70-60 scale.
<- students %>%
students mutate(Final = .2*Hmwk + .25*E1 + .25*E2 + .3*E3,
Grade = case_when(
< 60 ~ "F",
Final < 69 ~ "D",
Final < 79 ~ "C",
Final < 89 ~ "B",
Final >= 90 ~ "A"))
Final
head(students)
## # A tibble: 6 × 8
## College Year Hmwk E1 E2 E3 Final Grade
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 NU 1 83.8 79 89 59 76.5 C
## 2 NU 1 77.1 60 97 63 73.6 C
## 3 NU 1 94.5 83 88 69 82.4 B
## 4 NU 1 84.3 91 77 72 80.5 B
## 5 NU 1 73.4 68 79 76 74.2 C
## 6 NU 1 94.7 89 96 79 88.9 B