class: inverse, center, middle # Tidy Data ---- .right.white[Slides adapted from [Bradley Boehmke R-Intro course](https://github.com/uc-r/Intro-R)] --- # Tidy Data <br><br> <img src="images/tidy-task.png" width="2560" style="display: block; margin: auto;" /> --- ## Tidy data simplifies life .pull-left[ <br> .font130[ A data set is tidy if: 1. Each .blue[__variable__] in in its own .blue[__column__] 2. Each .red[__observation__] is in its own .red[__row__] 3. Each .grey[__value__] is in its own .grey[__cell__] ] ] .pull-right[ <br> <img src="images/tidy-data-example.png" width="984" style="display: block; margin: auto;" /> ] --- .pull-left[ You are going to learn four key __tidyr__ functions that allow you to solve the vast majority of your data tidying challenges: - `pivot_longer`: transforms data from wide to long - `pivot_wider`: transforms data from long to wide - `separate`: splits a single column into multiple columns - `unite`: combines multiple columns into a single column ] .pull-right[ <br> <img src="images/tidyr-large.png" width="50%" height="50%" style="display: block; margin: auto;" /> ] --- layout: false # Prerequisites .pull-left[ ### Packages ```r library(tidyverse) # or directly w/library(tidyr) ``` ] .pull-right[ ### Data Follow along in examples: ```r load("data/tidy_data.RData") ``` Your turn exercises: ```r bomber_wide <- read_rds("data/bomber_wide.rds") bomber_long <- read_rds("data/bomber_long.rds") bomber_prefix <- read_rds("data/bomber_prefix.rds") bomber_mess <- read_rds("data/bomber_mess.rds") ``` ] --- # .red.font120[`pivot_longer`]: wide to long .bold[We can transform wide data to long with .font140.grey[`pivot_longer()`]] .center[ <img src="images/tidyr-01.png" width="40%" height="40%" style="display: block; margin: auto;" /> ] -- .center[ .blue[cases] %>% pivot_longer(!country, names_to = "Year", values_to = "n") .white.font80[ .content-box-blue-dark[data frame to transform].content-box-white[name of new key colum] .content-box-white[name of new value column].content-box-white[columns selection] ] ] --- # .red.font120[`pivot_longer`]: wide to long .bold[We can transform wide data to long with .font140.grey[`pivot_longer()`]] .center[ <img src="images/tidyr-02.png" width="40%" height="40%" style="display: block; margin: auto;" /> cases %>% pivot_longer(!country, names_to = .green["Year"], value_to = "n") .white.font80[ .content-box-blue[data frame to transform].content-box-green-dark[name of new key colum] .content-box-white[name of new value column].content-box-white[columns selection] ] ] --- # .red.font120[`pivot_longer`]: wide to long .bold[We can transform wide data to long with .font140.grey[`pivot_longer()`]] .center[ <img src="images/tidyr-03.png" width="40%" height="40%" style="display: block; margin: auto;" /> cases %>% pivot_longer(!country, names_to = "Year", value_to = .red["n"]) .white.font80[ .content-box-blue[data frame to transform].content-box-green[name of new key colum] .content-box-red-dark[name of new value column].content-box-white[columns selection] ] ] --- # .red.font120[`pivot_longer`]: wide to long .bold[We can transform wide data to long with .font140.grey[`pivot_longer()`]] .center[ <img src="images/tidyr-04.png" width="40%" height="40%" style="display: block; margin: auto;" /> cases %>% pivot_longer(.grey[!country], names_to = "Year", value_to = "n") .white.font80[ .content-box-blue[data frame to transform].content-box-green[name of new key colum] .content-box-red[name of new value column].content-box-grey-dark[columns selection] ] ] --- # .red.font120[`pivot_longer`]: wide to long .bold[We can transform wide data to long with .font140.grey[`pivot_longer()`]] .pull-left[ .center.font120.bold[Cod Alternatives] ```r # These all produce the same results: cases %>% pivot_longer(cols = 2:4, names_to = "Year", values_to = "n") cases %>% pivot_longer(cols = `2011`:`2013`, names_to = "Year", values_to = "n") cases %>% pivot_longer(cols = c(`2011`,`2012`,`2013`), names_to = "Year", values_to = "n") cases %>% pivot_longer(cols = starts_with("20"), names_to = "Year", values_to = "n") cases %>% pivot_longer(cols = -country, names_to = "Year", values_to = "n") ``` ] .pull-right[ <img src="images/tidyr-04.png" width="1077" style="display: block; margin: auto;" /> ] --- # .red.font120[`pivot_wider`]: long to wide .bold[We can transform long data to wide with .font140.grey[`pivot_wider()`]] .center[ <img src="images/tidyr-spread.png" width="40%" height="40%" style="display: block; margin: auto;" /> ] -- .center[ cases %>% pivot_wider(names_from = .purple[Year], valueS_from = .red[n]) .white.font80[ .content-box-purple-dark[column to use as new column names] .content-box-red-dark[column to use as values] ] ] --- class: yourturn # Your Turn! .pull-left[ ### Challenge 1. Reshape the `bomber_wide` data from ___wide to long___ and name the new value column "Flying_Hrs" 2. Reshape the `bomber_long` data from ___long to wide___ using the "Output" variable for the new column names and the "Value" variable to fill in values ] -- .pull-right[ ### Solution ```r # 1 bomber_wide %>% pivot_longer(cols = `1996`:`2014`, names_to = "Year", values_to = "Flying_Hrs") %>% head() ## # A tibble: 6 x 4 ## Type MD Year Flying_Hrs ## <chr> <chr> <chr> <int> ## 1 Bomber B-1 1996 26914 ## 2 Bomber B-1 1997 25219 ## 3 Bomber B-1 1998 24205 ## 4 Bomber B-1 1999 23306 ## 5 Bomber B-1 2000 25013 ## 6 Bomber B-1 2001 25059 ``` ] --- class: yourturn # Your Turn! .pull-left[ ### Challenge 1. Reshape the `bomber_wide` data from ___wide to long___ and name the new value column "Flying_Hrs" 2. Reshape the `bomber_long` data from ___long to wide___ using the "Output" variable for the new column names and the "Value" variable to fill in values ] .pull-right[ ### Solution ```r # 2 bomber_long %>% pivot_wider(names_from = Output, values_from = Value) %>% head() ## # A tibble: 6 x 6 ## Type MD FY FH Gallons Cost ## <chr> <chr> <int> <int> <int> <int> ## 1 Bomber B-1 1996 26914 88594449 72753781 ## 2 Bomber B-1 1997 25219 85484074 71297263 ## 3 Bomber B-1 1998 24205 85259038 84026805 ## 4 Bomber B-1 1999 23306 79323816 71848336 ## 5 Bomber B-1 2000 25013 86230284 58439777 ## 6 Bomber B-1 2001 25059 86892432 94946077 ``` ] --- # .red.font120[`separate`]: one to multiple .bold[We can split a single column into multiple columns using .grey[`separate()`]] .center[ <img src="images/tidyr-separate-1.png" width="90%" height="90%" style="display: block; margin: auto;" /> ] --- # .red.font120[`separate`]: one to multiple .bold[We can split a single column into multiple columns using .grey[`separate()`]] .center[ <img src="images/tidyr-separate-2.png" width="90%" height="90%" style="display: block; margin: auto;" /> ] .center[ storms %>% separate(col = .purple[date], into = c("year", "month", "day"), sep = "-") .white.font80[ .content-box-purple-dark[column to split] .content-box-white[names of new columns] .content-box-white[separator to split by] ] ] --- # .red.font120[`separate`]: one to multiple .bold[We can split a single column into multiple columns using .grey[`separate()`]] .center[ <img src="images/tidyr-separate-3.png" width="90%" height="90%" style="display: block; margin: auto;" /> ] .center[ storms %>% separate(col = date, into = .red[c("year", "month", "day")], sep = "-") .white.font80[ .content-box-purple[column to split] .content-box-red-dark[names of new columns] .content-box-white[separator to split by] ] ] --- # .red.font120[`separate`]: one to multiple .bold[We can split a single column into multiple columns using .grey[`separate()`]] .center[ <img src="images/tidyr-separate-4.png" width="90%" height="90%" style="display: block; margin: auto;" /> ] .center[ storms %>% separate(col = date, into = c("year", "month", "day"), sep = .grey["-"]) .white.font80[ .content-box-purple[column to split] .content-box-red[names of new columns] .content-box-grey-dark[separator to split by] ] ] --- # .red.font120[`unite`]: multiple to one .bold[We can combine multiple columns into one using .grey[unite()`]] .center[ <img src="images/tidyr-unite.png" width="90%" height="90%" style="display: block; margin: auto;" /> ] .center[ storms %>% unite(col = .purple[date], .red[c("year", "month", "day")], sep = "-") .white.font80[ .content-box-purple-dark[new column name] .content-box-red-dark[colums to combine] .content-box-grey-dark[separator to use] ] ] --- class: yourturn # Your Turn! .pull-left[ ### Challenge 1. Reshape the `bomber_prefix` data so that the "prefix" and "number" columns are combined into a “MD” variable with “-“ separator ] -- .pull-right[ ### Solution ```r bomber_prefix %>% unite(col = MD, prefix, number, sep = "-") %>% head ## Type MD FY Output Value ## 1 Bomber B-1 1996 FH 26914 ## 2 Bomber B-1 1997 FH 25219 ## 3 Bomber B-1 1998 FH 24205 ## 4 Bomber B-1 1999 FH 23306 ## 5 Bomber B-1 2000 FH 25013 ## 6 Bomber B-1 2001 FH 25059 ``` ] --- # Leverage cheat sheet .font130.center[Use __tidyr__ tips on back of Data Import cheat sheet] .font120.center[[hrstudio.com/resources/cheatsheets](https://www.rstudio.com/resources/cheatsheets/)] <img src="images/cheatsheet-tidyr.png" width="2592" style="display: block; margin: auto;" /> --- class: yourturn # Big Challenge .pull-left[ ### Challenge .font120[Reshape the __bomber_mess__ data so it looks like:] ``` ## # A tibble: 57 x 6 ## Type MD FY FH Gallons Cost ## <chr> <chr> <chr> <int> <int> <int> ## 1 Bomber B-1 1996 26914 88594449 72753781 ## 2 Bomber B-1 1997 25219 85484074 71297263 ## 3 Bomber B-1 1998 24205 85259038 84026805 ## 4 Bomber B-1 1999 23306 79323816 71848336 ## 5 Bomber B-1 2000 25013 86230284 58439777 ## 6 Bomber B-1 2001 25059 86892432 94946077 ## 7 Bomber B-1 2002 26581 89198262 96458536 ## 8 Bomber B-1 2003 21491 74485788 68650070 ## 9 Bomber B-1 2004 28118 101397707 101895634 ## 10 Bomber B-1 2005 21859 78410415 124816690 ## # … with 47 more rows ``` ] -- .pull-right[ ### Solution ```r bomber_mess %>% unite(col = MD, prefix:number, sep = "-") %>% separate(Metric, into = c("FY", "Output")) %>% pivot_wider(names_from= Output, values_from = Value) %>% as_tibble() %>% head ## # A tibble: 6 x 6 ## Type MD FY FH Gallons Cost ## <chr> <chr> <chr> <int> <int> <int> ## 1 Bomber B-1 1996 26914 88594449 72753781 ## 2 Bomber B-1 1997 25219 85484074 71297263 ## 3 Bomber B-1 1998 24205 85259038 84026805 ## 4 Bomber B-1 1999 23306 79323816 71848336 ## 5 Bomber B-1 2000 25013 86230284 58439777 ## 6 Bomber B-1 2001 25059 86892432 94946077 ``` ] --- # Key things to remember .pull-left[ <br> * .bold[`pivot_longer`]: reshape wide data to long * .bold[`pivot_wider`]: reshape long data to wide * .bold[`separate`]: reshape a single variable into multiple * .bold[`unite`]: reshape multiple variables into one * .bold[Note:] .bold[`tidyr`] just got some updates and some of this functionality will be changing (i.e. `gather()` and `spread()` are changing to `pivot_longer()` and `pivot_wider()`) ] .pull-right[ <br><br> <img src="images/information-overload.jpg" style="display: block; margin: auto;" /> ] --- # Questions? <br> <img src="images/questions.png" width="450" height="450" style="display: block; margin: auto;" />