R provides very powerful tools for data visualization, particularly ggplot
. This leads to a dilemma when teaching, however. Do we first teach the basics or ggplot
before students can visualize their data, or do we use other simpler tools for summarization and visualization, albeit of a limited kind, and later teach more about the powerful ggplot
tools? The psyntur
package aims to provide some tools if the latter approach is taken. These functions aim to make it quick and easy to perform the common kinds of data exploration and visualization. All functions, however, are wrappers around sets of ggplot
commands In this vignette, we showcase some of these tools.
The required functions as well as some example psychological science data sets can be loaded with the usual library
command.
library(psyntur)
We can perform a standard 2d scatterplot with scatterplot
. For example, using the faithfulfaces
data provided by psyntur
, we can plot a scatterplot showing the relationship between the perceived (sexual) faithfulness of a person in a photo against their perceived trustworthiness as follows.
scatterplot(x = trustworthy, y = faithful, data = faithfulfaces)
We can colour code the points according to the value of a third, usually categorical variable, by using the by
argument. For example, here we colour code the points according to the sex of the person in the photo.
scatterplot(x = trustworthy, y = faithful, data = faithfulfaces, by = face_sex)
We can add the line of best fit to all points in the scatterplot, or to each set of coloured points in the scatterplot if by
is used, with the best_fit_line
argument.
scatterplot(x = trustworthy, y = faithful, data = faithfulfaces,
by = face_sex, best_fit_line = TRUE)
Boxplots, also known as box-and-whisker plots, or Tukey boxplots or box-and-whisker plots, display the distributions of univariate data, optionally grouped according to other variables. For example, to show a boxplot of the distribution of all the response times in an experiment that collected response times in different conditions, which is provided by the vizverb
data sets, we can do the following.
tukeyboxplot(y = time, data = vizverb)
We can add all the points as jittered points as follows.
tukeyboxplot(y = time, data = vizverb, jitter = TRUE)
If we want to plot the distribution of time
for each of the two different tasks performed in the experiment, provide by the task
variable in vizverb
, we can set the x
variable to task
as follows.
tukeyboxplot(y = time, x= task, data = vizverb)
The vizverb
data is from a two-way factorical experiment where reaction times are collected for different tasks (the task
variable) and using different responses (the response
variable). We can plot the distribution of time
according to both task
and response
by using x
and by
together as follows.
tukeyboxplot(y = time, x= task, data = vizverb, by = response)
Jittered points can be put on these plots by using jitter = TRUE
.
tukeyboxplot(y = time, x= task, data = vizverb,
by = response, jitter = TRUE)
Continuous variables can be used as the x
variable as in the following example using the R built in ToothGrowth
data set.
tukeyboxplot(y = len, x = dose, data = ToothGrowth)
This can be used with by
and jitter
too.
tukeyboxplot(y = len, x = dose, data = ToothGrowth,
by = supp, jitter = TRUE, jitter_width = .5)
Histograms can be made with the histrogram
function. For example, to plot the onset ages of schizophrenia, using the schizophrenia
data, we can do the following.
histogram(x = age, data = schizophrenia)
We can change the number of bins from its default of 10 with the bins
argument.
histogram(x = age, data = schizophrenia, bins = 20)
If we use a grouping variable, such as the gender
variable, using the by
argument, we obtain a stacked histogram.
histogram(x = age, data = schizophrenia, by = gender, bins = 20)
When using by
, we can obtain dodged rather than stacked histograms by setting position = 'dodge'
as in the following example.
histogram(x = age, data = schizophrenia,
by = gender, bins = 20, position = 'dodge')
Likewise, we can obtain overlapping histograms by setting position = 'identity'
as in the following example.
histogram(x = age, data = schizophrenia,
by = gender, bins = 20, position = 'identity')
In the case of position = 'identity'
, it is usually required to make the bars transparent by setting the alpha
value to be less than 1.0, as in the following example.
histogram(x = age, data = schizophrenia,
by = gender, bins = 20, position = 'identity', alpha = 0.7)