dotsViolin. Dot Plots Mimicking Violin Plots


Buy Me a Coffee at ko-fi.com

Modifies dot plots to have different sizes of dots mimicking violin plots and identifies modes or peaks for them (Rosenblatt, 1956; Parzen, 1962).

dotsViolin, an R package (R Core Team, 2023) uses gridExtra (Auguie, 2017), gtools (Bolker et al., 2022), tidyr (Wickham et al., 2023c), stringr (Wickham, 2022), dplyr (Wickham et al., 2023b), ggplot2 (Wickham et al., 2023a), lazyeval (Wickham, 2019), magrittr (Bache and Wickham, 2022), rlang (Henry and Wickham, 2023), scales (Wickham and Seidel, 2022), tidyselect (Henry and Wickham, 2022)

Documentation was written with R-packages roxygen2 (Wickham et al., 2022), knitr (Xie, 2023), Rmarkdown (Allaire et al., 2023).

Academic presentation related (Roa-Ovalle, 2019)

Installation

devtools::install_gitlab(repo = "ferroao/dotsViolin")

Releases

News

Citation

To cite package ‘dotsViolin’ in publications use:

Roa-Ovalle F, Telles M (2023). dotsViolin: Integrated tables in dot and violin R ggplots. R package version 0.0.1, https://gitlab.com/ferroao/dotsViolin.

To write citation to file:

sink("dotsViolin.bib")
toBibtex(citation("dotsViolin"))
sink()

Authors

Fernando Roa
Mariana PC Telles

Plot window

Define your plotting window size with something like par(pin=c(10,6)), or with svg(), png(), etc.

In VSCode, you could use something like this

{
  "r.plot.useHttpgd": false,
  "r.plot.devArgs": {
    "width": 800,
    "height": 600
  }
}

Examples

1 Discrete Data:

library(dotsViolin)

fabaceae_mode_counts <- get_modes_counts(fabaceae_clade_n_df, "clade", "parsed_n")
fabaceae_mode_counts
clade m1 m2 m3 count
Caesalpinieae 12 NA NA 29
Cassieae 14 8 12 64
Cercidoideae 14 7 NA 33
Detarioideae 12 8,17 NA 50
Dialioideae 14 NA NA 6
Dimorphandra and rel. 14 13 NA 16
Mimosoids 13 26 14 221
outgroup 8 12 11 145
Papilionoideae 8 11 7 1410
Umtiza and rel. 14 NA NA 7
library(dotsViolin)

fabaceae_clade_n_df_count <- make_legend_with_stats(fabaceae_mode_counts, "label_count", 1, TRUE)
fabaceae_clade_n_df$label_count <- fabaceae_clade_n_df_count$label_count[match(
  fabaceae_clade_n_df$clade,
  fabaceae_clade_n_df_count$clade
)]
desiredorder1 <- unique(fabaceae_clade_n_df$clade)
fabaceae_clade_n_df
                        tip.label          clade parsed_n
1     KX374504_Abarema_centiflora      Mimosoids       13
2   KX213142_Adenodolichos_bussei Papilionoideae       11
3      KX792912_Almaleea_cambagei Papilionoideae        8
4 KP109982_Amphithalea_cymbifolia Papilionoideae        9
5 KP230727_Argyrolobium_tuberosum Papilionoideae       13
6        GU220019_Ateleia_arsenii Papilionoideae       14
                              label_count
1 Mimosoids             13   26 14  (221)
2 Papilionoideae         8   11  7 (1410)
3 Papilionoideae         8   11  7 (1410)
4 Papilionoideae         8   11  7 (1410)
5 Papilionoideae         8   11  7 (1410)
6 Papilionoideae         8   11  7 (1410)
par(mar = c(0, 0, 0, 0), omi = rep(0, 4))

dots_and_violin(
  fabaceae_clade_n_df, "clade", "label_count", "parsed_n", 2,
  30, "Chromosome haploid number", desiredorder1, 1, .85, 4,
  "ownwork",
  violin = FALSE
)

par(mar = c(0, 0, 0, 0), omi = rep(0, 4))

dots_and_violin(
  fabaceae_clade_n_df, "clade", "label_count", "parsed_n", 2,
  30, "Chromosome haploid number", desiredorder1, 1, .85, 4,
  dots = FALSE
)

par(mar = c(0, 0, 0, 0), omi = rep(0, 4))

dots_and_violin(
  fabaceae_clade_n_df, "clade", "label_count", "parsed_n", 2,
  30, "Chromosome haploid number", desiredorder1, 1, .85, 4
)

2 Continuous Data:

Define your plotting window size with something like par(pin=c(10,6)), or with svg(), png(), etc.

library(dotsViolin)

fabaceae_Cx_peak_counts_per_clade_df <- get_peaks_counts_continuous(
  fabaceae_clade_1Cx_df,
  "clade", "Cx", 2, 0.25, 1, 2
)
fabaceae_Cx_peak_counts_per_clade_df
clade m1 m2 counts
Caesalpinieae Caesalpinieae 0.85,1.80 2
Cassieae Cassieae 0.69 0.52,0.56 6
Cercidoideae Cercidoideae 0.60 5
COM clade COM clade 0.35,0.50,0.83 3
Detarioideae Detarioideae 2.21 0.84,2.01 4
Dimorphandra & rel. Dimorphandra & rel. 0.73,0.79 2
Malvids Malvids 0.40 0.63 8
Mimosoids Mimosoids 0.70 0.43 42
outgroups outgroups 0.48 1.38,2.76 9
Papilionoideae Papilionoideae 0.59 212
Polygala amara Polygala amara 0.42 1
Umtiza & rel. Umtiza & rel. 0.65,1.05 2
Vitis vinifera Vitis vinifera 0.43 1
library(dotsViolin)

namecol <- "labelcountcustom"
fabaceae_clade_1Cx_modes_count_df <- make_legend_with_stats(
  fabaceae_Cx_peak_counts_per_clade_df,
  namecol, 1, TRUE
)
fabaceae_clade_1Cx_df$labelcountcustom <-
  fabaceae_clade_1Cx_modes_count_df$labelcountcustom[match(
    fabaceae_clade_1Cx_df$clade,
    fabaceae_clade_1Cx_modes_count_df$clade
  )]
desiredorder <- unique(fabaceae_clade_1Cx_df$clade)
fabaceae_clade_1Cx_df
                              name     clade     Cx      genus ownwork
6      'Silene_latifolia_JF715055' outgroups 2.7000     Silene      no
7  'Fagopyrum_esculentum_NC010776' outgroups 1.4350  Fagopyrum      no
11    'Helianthus_annuus_NC007977' outgroups 2.4250 Helianthus      no
12        'Daucus_carota_NC008325' outgroups 2.8375     Daucus      no
14        'Olea_europaea_NC013707' outgroups 1.9500       Olea      no
18       'Coffea_arabica_NC008535' outgroups 0.6000     Coffea      no
                                      labelcountcustom
6  outgroups                     0.48 1.38,2.76    (9)
7  outgroups                     0.48 1.38,2.76    (9)
11 outgroups                     0.48 1.38,2.76    (9)
12 outgroups                     0.48 1.38,2.76    (9)
14 outgroups                     0.48 1.38,2.76    (9)
18 outgroups                     0.48 1.38,2.76    (9)
par(mar = c(0, 0, 0, 0), omi = rep(0, 4))

dots_and_violin(
  fabaceae_clade_1Cx_df, "clade", "labelcountcustom", "Cx", 3,
  3, "Genome Size", desiredorder, 0.03, 0.25, 2,
  "ownwork"
)

par(mar = c(0, 0, 0, 0), omi = rep(0, 4))

dots_and_violin(
  fabaceae_clade_1Cx_df, "clade", "labelcountcustom", "Cx", 3,
  3, "Genome Size", desiredorder, 0.03, 0.25, 2,
  dots = FALSE
)

par(mar = c(0, 0, 0, 0), omi = rep(0, 4))

dots_and_violin(
  fabaceae_clade_1Cx_df, "clade", "labelcountcustom", "Cx", 3,
  3, "Genome Size", desiredorder, 0.03, 0.25, 2,
  "ownwork",
  violin = FALSE
)

References

R-packages

Allaire J, Xie Y, Dervieux C, McPherson J, Luraschi J, Ushey K, Atkins A, Wickham H, Cheng J, Chang W, Iannone R. 2023. Rmarkdown: Dynamic documents for r. R package version 2.24. https://CRAN.R-project.org/package=rmarkdown
Auguie B. 2017. gridExtra: Miscellaneous functions for “grid” graphics. R package version 2.3. https://CRAN.R-project.org/package=gridExtra
Bache SM, Wickham H. 2022. Magrittr: A forward-pipe operator for r. R package version 2.0.3. https://CRAN.R-project.org/package=magrittr
Bolker B, Warnes GR, Lumley T. 2022. Gtools: Various r programming tools. R package version 3.9.4. https://github.com/r-gregmisc/gtools
Henry L, Wickham H. 2022. Tidyselect: Select from a set of strings. R package version 1.2.0. https://CRAN.R-project.org/package=tidyselect
Henry L, Wickham H. 2023. Rlang: Functions for base types and core r and tidyverse features. R package version 1.1.1. https://CRAN.R-project.org/package=rlang
R Core Team. 2023. R: A language and environment for statistical computing R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
Wickham H. 2019. Lazyeval: Lazy (non-standard) evaluation. R package version 0.2.2. https://CRAN.R-project.org/package=lazyeval
Wickham H. 2022. Stringr: Simple, consistent wrappers for common string operations. R package version 1.5.0. https://CRAN.R-project.org/package=stringr
Wickham H, Chang W, Henry L, Pedersen TL, Takahashi K, Wilke C, Woo K, Yutani H, Dunnington D. 2023a. ggplot2: Create elegant data visualisations using the grammar of graphics. R package version 3.4.4. https://CRAN.R-project.org/package=ggplot2
Wickham H, Danenberg P, Csárdi G, Eugster M. 2022. roxygen2: In-line documentation for r. R package version 7.2.3. https://CRAN.R-project.org/package=roxygen2
Wickham H, François R, Henry L, Müller K, Vaughan D. 2023b. Dplyr: A grammar of data manipulation. R package version 1.1.3. https://CRAN.R-project.org/package=dplyr
Wickham H, Seidel D. 2022. Scales: Scale functions for visualization. R package version 1.2.1. https://CRAN.R-project.org/package=scales
Wickham H, Vaughan D, Girlich M. 2023c. Tidyr: Tidy messy data. R package version 1.3.0. https://CRAN.R-project.org/package=tidyr
Xie Y. 2023. Knitr: A general-purpose package for dynamic report generation in r. R package version 1.43. https://yihui.org/knitr/

Academia

Parzen E. 1962. On estimation of a probability density function and mode The Annals of Mathematical Statistics, 33: 1065–1076. https://doi.org/10.1214/aoms/1177704472
Roa-Ovalle F. 2019. Poliploidia e duplicação genômica nas leguminosas brasileiras In: Rocha LL da (ed) Sociedade Botânica do Brasil. https://70cnbot.botanica.org.br/wp-content/uploads/2019/11/Livro-70%C2%BA-Congresso-Nacional-de-Bot%C3%A2nica..pdf
Rosenblatt M. 1956. Remarks on some nonparametric estimates of a density function The Annals of Mathematical Statistics, 27: 832–837. https://doi.org/10.1214/aoms/1177728190