Tibble Cheat Sheet



The Base R Cheat Sheet has some other basic helpful functions, particularly under the Vector Functions and Maths Functions sections. Every tibble is a list of. R cheat sheet By @MaryJoWebster March 2019. Data frame – This is from Base R; this is used for storing data tables. Tidyverse makes a “tibble” which is supposed to be a slightly better version of a data frame.

In R, vectors are the most common data structure. In this book, we’ll often represent vectors like this:

Each orange cell represents one element of the vector. As you’ll see, different kinds of vectors can hold different kinds of elements.

There are two kinds of vectors: atomic vectors and lists. Tibbles are a specific kind of list.

In this chapter, we’ll cover these three data structures, explaining how they differ and showing you how to manipulate each one.

2.1 Atomic vectors

Atomic vectors are the “atoms” of R—the simple building blocks upon which all else is built. There are four types of atomic vector that are important for data analysis:

  • integer vectors (<int>) contain integers.
  • double vectors (<dbl>) contain real numbers.
  • character vectors (<chr>) contain strings made with '.
  • logical vectors (<lgl>) contain TRUE or FALSE.

Integer atomic vectors contain only integers, double atomic vectors contain only doubles, and so on. Together, integer and double vectors are known as numeric vectors. All vectors can also contain the missing value NA.

5.4 Meet the gapminder data frame or “tibble” 5.5 Look at the variables inside a data frame; 5.6 Recap; 6 Introduction to dplyr. 6.1.1 Load dplyr and gapminder; 6.1.2 Say hello to the gapminder tibble; 6.2 Think before you create excerpts of your data 6.3 Use filter to subset data row-wise; 6.4 Meet the new pipe operator. Apply.f element-wise to.x, return a logical vector niris%% transmute(n = maplgl(data, is.matrix)) purrr::mapint(.x,.f.

Tibble cheat sheet printable

In R, single numbers, logicals, and strings are just atomic vectors of length 1, so

creates a character vector. Likewise,

creates a double vector.

To create atomic vectors with more than one element, use c() to combine values.

To create an integer vector by hand, you’ll need to add L to the end of each number.

Without the Ls, R will create a vector of doubles.

2.1.1 Properties

Vectors (both atomic vectors and lists) all have two key properties: type and length.

You can check the type of any vector with typeof().

Use length() to find a vector’s length.

Atomic vectors can also have names.

Cheat

You can access a vector’s names with names().

2.1.2 Subsetting

v is an atomic vector of doubles.

We can subsetv to select specific elements, ignoring the others.

The operators [ and [[ subset vectors. Use [ to select multiple elements, and [[ to select just one. We’ll cover four ways to use [ and then discuss [[.

Positive integers

Subset with a vector of positive integers to extract elements by position.

Note that, in R, indices start at 1, not 0, so the above code extracts the first two elements of v.

You can also use : to create a vector of adjacent integers. The following select the first three elements of v.

Negative integers

Subset with a vector of negative integers to exclude elements. The following code removes the first and third elements of v.

Names

If a vector has names, you can subset with a character vector.

Logical vectors

If you supply a vector of TRUEs and FALSEs, [ will select the elements that correspond to the TRUEs.

The following extracts just the first and third elements.

You’ll rarely subset by typing out TRUEs and FALSEs. Instead, you’ll typically create a logical vector with a function or condition.

For example, the following code selects just the elements of v greater than 2.

v > 2 results in a logical vector the same length as v.

[ then uses this logical vector to subset v, resulting in just the elements of v greater than 2.

v_missing has NAs.

We can pass !is.na(v_missing) into [ to extract out just the non-NA elements.

Select single values with [[

Unlike [, [[ can only extract single elements.

Cheat

You’ll get an error if you try to use [[ to select more than one element.

Use [[ instead of [ if you want to make it clear that your code only selects one item. As you’ll see in the Lists section, the distinction between [[ and [ is more important for lists than for atomic vectors.

2.1.3 Applying functions

Vectors are central to programming in R, and so many R functions are designed to work with vectors of any length.

You already saw how to call typeof() to return the type of a vector.

sum() sums a vector’s elements.

You can use sum() with both numeric (i.e., double and integer) vectors, as well as with logical vectors.

When applied to a logical vector, sum() returns the number of TRUEs.

mean() works similarly.

The Base R Cheat Sheet has some other basic helpful functions, particularly under the Vector Functions and Maths Functions sections.

2.1.4 Augmented vectors

Augmented vectors are atomic vectors with additional metadata. There are four important augmented vectors:

  • factors<fct>, which are used to represent categorical variables can takeone of a fixed and known set of possible values (called the levels).

  • ordered factors<ord>, which are like factors but where the levels have anintrinsic ordering (i.e. it’s reasonable to say that one level is “less than”or “greater than” another variable).

  • dates<dt>, record a date.

  • date-times<dttm>, which are also known as POSIXct, record a dateand a time.

For now, you just need to recognize these when you encounter them. You’ll learn how to create each type of augmented vector later in the course.

2.2 Lists

Unlike atomic vectors, which can only contain a single type, lists can contain any collection of R objects.

2.2.1 Basics

The following reading will introduce you to lists.

  • Recursive vectors (lists)[r4ds-20.5]

2.2.2 Flattening

You can flatten a list into an atomic vector with unlist().

unlist() returns an atomic vector even if the original list contains other lists or vectors.

2.3 Tibbles

Tibbles are actually lists.

Every tibble is a list of vectors.

These vectors form the tibble columns.

Take the tibble mpg.

Each variable in mpg (manufacturer, model, displ, etc.) is a vector. manufacturer is a character vector, displ is a double vector, and so on.

2.3.1 Creation

There are two ways to create tibbles by hand. First, you can use tibble().

tibble() takes individual vectors and turns them into a tibble.

Second, you can use tribble().

Typically, it will be obvious whether it’s better to use tibble() or tribble(). One representation will either be much easier to type or much clearer than the other.

2.3.2 Variables

R Tibble Cheat Sheet

There are several ways to extract variables out of tibbles. Tibbles are lists, so [[ and $ still work.

Use pull(), the dplyr equivalent, when you want to use a pipe.

Note that pull(), like [[ and $, will return just the vector of values for a given column,

Tibble Cheat Sheet Printable

while select() returns a tibble.

Tibble Cheat Sheets

Cheat

2.3.3 Dimensions

Printing a tibble tells you the column names and overall dimensions.

To access the dimensions directly, you have three options:

To get the variable names, use names():

Installation and use

  • Install all the packages in the tidyverse by running install.packages('tidyverse').

  • Run library(tidyverse) to load the core tidyverse and make it availablein your current R session.

Learn more about the tidyverse package at https://tidyverse.tidyverse.org.

Core tidyverse

The core tidyverse includes the packages that you’re likely to use in everyday data analyses. As of tidyverse 1.3.0, the following packages are included in the core tidyverse:

ggplot2

ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details. Go to docs...

dplyr

dplyr provides a grammar of data manipulation, providing a consistent set of verbs that solve the most common data manipulation challenges. Go to docs...

tidyr

tidyr provides a set of functions that help you get to tidy data. Tidy data is data with a consistent form: in brief, every variable goes in a column, and every column is a variable. Go to docs...

readr

readr provides a fast and friendly way to read rectangular data (like csv, tsv, and fwf). It is designed to flexibly parse many types of data found in the wild, while still cleanly failing when data unexpectedly changes. Go to docs...

purrr

purrr enhances R’s functional programming (FP) toolkit by providing a complete and consistent set of tools for working with functions and vectors. Once you master the basic concepts, purrr allows you to replace many for loops with code that is easier to write and more expressive. Go to docs...

tibble

tibble is a modern re-imagining of the data frame, keeping what time has proven to be effective, and throwing out what it has not. Tibbles are data.frames that are lazy and surly: they do less and complain more forcing you to confront problems earlier, typically leading to cleaner, more expressive code. Go to docs...

R tibble cheat sheet

stringr

stringr provides a cohesive set of functions designed to make working with strings as easy as possible. It is built on top of stringi, which uses the ICU C library to provide fast, correct implementations of common string manipulations. Go to docs...

forcats

forcats provides a suite of useful tools that solve common problems with factors. R uses factors to handle categorical variables, variables that have a fixed and known set of possible values. Go to docs...

The tidyverse also includes many other packages with more specialised usage. They are not loaded automatically with library(tidyverse), so you’ll need to load each one with its own call to library().

Import

Tibble Cheat Sheet

As well as readr, for reading flat files, the tidyverse package installs a number of other packages for reading data:

  • DBI for relational databases.(Maintained by Kirill Müller.)You’ll need to pair DBI with a database specific backends likeRSQLite,RMariaDB,RPostgres, orodbc.Learn more at https://db.rstudio.com.

  • haven for SPSS, Stata, and SAS data.

  • httr for web APIs.

  • readxl for .xls and .xlsx sheets.

  • rvest for web scraping.

  • jsonlitefor JSON. (Maintained by Jeroen Ooms.)

  • xml2 for XML.

Wrangle

In addition to tidyr, and dplyr, there are five packages (including stringr and forcats) which are designed to work with specific types of data:

  • lubridate for dates and date-times.
  • hms for time-of-day values.
  • blob for storing blob (binary) data.

Program

In addition to purrr, which provides very consistent and natural methods for iterating on R objects, there are two additional tidyverse packages that help with general programming challenges:

  • magrittr provides the pipe, %>% usedthroughout the tidyverse. It also provide a number of more specialisedpiping operators (like %$% and %<>%) that can be useful in other places.

  • glue provides an alternative topaste() that makes it easier to combine data and strings.

Model

Modeling with the tidyverse uses the collection of tidymodels packages, which largely replace the modelr package used in R4DS. These packages provide a comprehensive foundation for creating and using models of all types. Visit the Getting Started guide or, for more detailed examples, go straight to the Learn page.

Get help

If you’re asking for R help, reporting a bug, or requesting a new feature, you’re more likely to succeed if you include a good reproducible example, which is precisely what the reprex package is meant for. You can learn more about reprex, along with other tips on how to help others help you in the help section.