Data Visualization With Ggplot2 Cheat Sheet



Python for Data Science Cheat Sheets. Python is one of the most widely used programming languages in the data science field.Python has many packages and libraries that are specifically tailored for certain functions, including pandas, NumPy, scikit-learn, Matplotlib, and SciPy.The most appealing quality of Python is that anyone who wants to learn it, even beginners, can do so quickly and easily. Data Visualization with ggplot2:: CHEAT SHEET ggplot2 is based on the grammar of graphics, the idea that you can build every graph from the same components: a data set, a coordinate system, and geoms—visual marks that represent data points. Basics GRAPHICAL PRIMITIVES a + geomblank (Useful for expanding limits).

Ggplot2 Cheatsheet

In this lesson we will dive into making common graphics with ggplot2. This approach follows The R Graphics Cookbook by Winston Chang.

ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

You will likely find RStudio’s Data Visualization Cheat Sheet helpful as a quick reference. If you want to learn more about ggplot2 after this lesson, the documentation has some good suggestions. ggplot2: Elegant Graphics for Data Analysis is the definitive book on the subject.

ggplot2 Glossary

  • The data is what we want to visualize. It consists of variables, which are stored as columns in a data frame.
  • geoms are the geometric objects that are drawn to represent the data, such as bars, lines, and points.
  • Aesthetic attributes, or aesthetics, are visual properties of geoms, such as x and y position, line color, point shapes, etc.
  • There are mappings from data values to aesthetics.
  • scales control the mapping from the values in the data space to values in the aesthetic space. A continuous y scale maps larger numerical values to vertically higher positions in space.
  • guides show the viewer how to map the visual properties back to the data space. The most commonly used guides are the tick marks and labels on an axis.

ggplot2 Mechanics

The ggplot() function is used to initialize the basic graph structure, then we add to it. The basic idea is that you specify different parts of the plot, and add them together using the + operator.

We will start with a blank plot.

Geometric objects are the actual marks we put on a plot. Examples include:

  • geom_point()
  • geom_line()
  • geom_boxplot()

To visualize data a plot should have at least one geom; there is no upper limit. You can add a geom to a plot using the + operator. Evaluating the following line of code will produce an informative error message.

Each type of geom usually has a required set of aesthetics to be set, and usually accepts only a subset of all aesthetics. Refer to the help pages to see what mappings each geom accepts, for instance help(geom_point). Aesthetic mappings are set with the aes() function. Examples include:

  • position (x and y coordinates)
  • color (“outside” color)
  • fill (“inside” color)
  • shape (of points)
  • linetype
  • size

To create a scatter plot we need to use the aes() function to tell ggplot which variable should be used for the x-coordinate of the points and which variable should be used for the y-coordinate of the points…

Scatter Plot

How has life expectancy changed over time?

Try replacing geom_point() with geom_jitter() to deal with over-plotting.

Ggplot2 sheet

How is life expectancy related to GDP per capita?

Bar Graph

What does the distribution of life expectancy look like?

Try plotting a histogram of GDP per capita, gdpPercap.

Data Visualization With Ggplot2 Cheat Sheet Pdf

Line Graph

How has life expectancy changed in Afghanistan over time?

How has GDP per capita changed in Afghanistan over time?

Grouping

How has life expectancy changed in each country over time?

Faceting

Do these time trends differ between continents?

Do the trends for GDP per capita look similar?

Layering

Does the relationship between life expectancy and time look linear? The first layer in this graph shows all the data points, the second layer shows a smoothed trend line and confidence interval.

If you change the smoothing method from loess to lm, does the linear model look reasonable?

Density Plots

What does the joint distribution of GDP per capita and life expectancy look like?

Does this relationship change over time?

Saving

Ggplot2 cheat sheet r

There are a couple ways to save figures made with ggplot2.

The easiest way to save a graph is to export directly from the RStudio Plots panel, by clicking on Export when the image is plotted. This will give you the option of png or pdf and allow you to select the directory where you wish to save the file.

Alternatively the ggsave() function from the ggplot2 library is the way to go if you want to create files programmatically.

Try using ggsave() to create a file named gdpPercap.png that contains a bar chart of GDP per capita.

A Final Example

Below is an example to show off a bunch of ggplot2’s features at once. Notice that different layers in a plot can use different data (the text labels are created using a smaller set of data).

Documentation built with Hugo using the Material theme.

This book is a practical introduction to creating effective visualizations using ggplot2. There are many excellent resources for learning ggplot2, including the following:

  • Hadley Wickham and Garrett Grolemund’s R for Data Science (R4DS)
  • The ggplot2 website
  • RStudio’s Data visualization with ggplot2 cheat sheet

We intend this book as a complement to these resources, building on what they teach about ggplot2, and we will link to them often.

R4DS, the website, and the cheat sheet mostly cover the mechanics of ggplot2. They teach you how to build plots in ggplot2, but the practice of creating effective visualizations is generally outside their scope. There is also a wealth of resources devoted to teaching effective visualization techniques, which we call visualization wisdom. These resources include:

  • William Cleveland’s books and papers, including The Elements of Graphing Data
  • Edward Tufte’s books, including The Visual Display of Quantitative Information
  • Claus Wilke’s Fundamentals of Data Visualization

Our goal is to combine ggplot2 mechanics and visualization wisdom into a single book. We hope readers come away with a solid grounding in ggplot2 and the ability to create effective visualizations for common situations. We can’t cover all types of visualizations, or all possible strategies for visualizing a given relationship. Instead, we hope to introduce readers to a way of thinking about data visualization that will help them approach and construct effective visualizations.

How to read this book

0.0.1 Code and data

You can easily copy the code in this book by hovering over a chunk and then clicking the copy button that appears in the upper-right corner.

Every dataset we use is contained in an R package, making it easy to follow along and replicate our examples. Many of the datasets are in our own dcldata package. To install dcldata, copy the following code and paste it into RStudio.

Example

To access the datasets, load the relevant package

Ggplot2

and then call the dataset by name.

All datasets have accompanying documentation, which you can find with the ? operator.

You can also view the names of all datasets in a given package with the data() function.

0.0.2 Organization

This book is organized thematically, with each chapter building upon the previous ones, and we recommend reading the chapters in order. For the most part, each chapter discusses a category of visualization (e.g., time series, distributions) and contains both ggplot2 mechanics and visualization wisdom. We’ve woven ggplot2 mechanics through the book so that readers quickly learn the most important skills, and then acquire more advanced skills as they become relevant to the visualization tasks at hand.

An advantage of the thematic approach is that readers just starting out with both ggplot2 and data visualization can simultaneously learn how to create plots in ggplot2 and how to make those plots effective. The thematic organization also enables readers to use the book as a visualization category reference. For example, if you know you want to visualize a distribution, you can go to the Distributions chapter and remind yourself of the various strategies.

A disadvantage of this organization is that the book does not function well as a reference for looking up ggplot2 mechanics (e.g., if you forget how scales work and try to refer back to the relevant section). The ggplot2 cheat sheet is a better resource for looking up mechanics, and we mention relevant sections so that readers can familiarize themselves with the cheat sheet for later use.

An evolving book

This book is not intended to be static. Starting in January 2019, we use this book to teach data visualization in the Stanford Data Challenge Lab (DCL) course. The DCL functions as a testing ground for educational materials, as our students give us routine feedback on what they read and do in the course. We use this feedback to constantly improve our materials, including this book. The source for the book is also available on GitHub where we welcome suggestions for improvements.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.