Matrices are a fundamental component of any scientific programming
language. In fact, MATLAB, a scientific language used by millions of
engineers, is an acronym for MATrix LABoratory. In R, there are several
ways to create and manipulate matrices. However, R’s matrix syntax is
quite different from most other programming languages — most of which do
share a familiar syntax. In this vignette, we discuss
ramify, a simple package providing R with additional matrix
functionality.
There are several methods available for constructing matrices in R.
The usual approach is to use the base function matrix. For
example, the following snippet of code creates the matrix \(\bigl(\begin{smallmatrix} 1 & 2 & 3 &
4 \\ 5 & 6 & 7 & 8 \end{smallmatrix} \bigr)\):
##      [,1] [,2] [,3] [,4]
## [1,]    1    2    3    4
## [2,]    5    6    7    8Notice that we need to specify the shape of the matrix (here we used
nrow = 2, but could also have set ncol  = 4,
or both). Also, by default, R fills matrices using column-major order.
To fill the matrix using row-major order, we had to set
thebyrow argument in matrix to
TRUE.(Even though we set byrow = TRUE, the
matrix is still stored in column-major order. This may have unintended
side-effects. For instance, removing the dimension attribute flattens
the matrix into a vector by appending each column together.) Although
the code is simple, users coming from other scientific languages may
prefer a more familiar syntax.
The following table gives some examples for constructing matrices in five of the most popular scientific languages.
| Language | Syntax | Note | 
|---|---|---|
| Julia | [1 2 3 4; 5 6 7 8] | NA | 
| Mathematica | {{1, 2, 3, 4}, {5, 6, 7, 8}} | NA | 
| MATLAB/GNU Octave | [1, 2, 3, 4; 5, 6, 7, 8] | commas may be omitted | 
| Python | [[1, 2, 3, 4], [5, 6, 7, 8]] | NA | 
| Python+NumPy | numpy.mat("1, 2, 3, 4; 5, 6, 7, 8") | commas may be omitted | 
What is most convenient about the matrix syntax expressed in column two is the visual separation of rows. This makes it quite easy for the user to see the structure of the matrix they are working with. You do not see this with the traditional matrix function in R, unless you force individual rows onto new lines manually:
##      [,1] [,2] [,3] [,4]
## [1,]    1    2    3    4
## [2,]    5    6    7    8But this is rather hacky in comparison and still requires the user to
specify how the vector should be split up (e.g., nrow = 2
and byrow = TRUE). The ramify package’s main
function mat extends matrix by adding this
simplicity using the more common syntax.
The ramify package is hosted on GitHub at . It is also
available on CRAN. To install the latest stable release from CRAN:
To install the development version from GitHub you can use the devtools
package:
# install.packages("devtools")
devtools::install_github("bgreenwell/ramify")  # development versionBug reports or issues should be submitted to . Suggestions for
improvement can be emailed directly to the package maintainer. The
version of ramify used in this paper is version 0.3.1.
mat functionThe main function in this package is mat. For all
intents and purposes, mat is simply an extension of the
base function matrix that adds two new S3 methods: a method
for class "character" and another for class
"list". Fortunately, since mat is simply a
wrapper around matrix, we can still use it in the exact
same way. That is, any use of matrix also applies to
mat.
Function mat provides a new way of creating matrices
using a convenient string initializer (not unlike the matrix constructor
in NumPy). For instance, we can recreate the matrix \(\bigl(\begin{smallmatrix} 1 & 2 & 3 &
4 \\ 5 & 6 & 7 & 8 \end{smallmatrix} \bigr)\) using
mat as follows:
## 
## Attaching package: 'ramify'## The following object is masked from 'package:graphics':
## 
##     clip##      [,1] [,2] [,3] [,4]
## [1,]    1    2    3    4
## [2,]    5    6    7    8The colon operator can also be used, as in
mat("1:4; 5:8").
The character method of mat is very simple. First, the
character string is split on the semicolons creating character strings
representing the row vectors. Second, the resulting character strings
are further split on commas. The individual characters are then parsed,
evaluated, and fed to matrix.
The first argument to mat (and the only one required)
should be a character string in which semicolons separate row vectors
and commas separate individual elements. However, at the time of this
writing, there are two optional arguments: rows and
sep. rows accepts a logical indicating whether
the semicolon separates rows (rows = TRUE) or columns
(rows = FALSE). The default is TRUE. For
instance, to create the matrix \[
  \begin{pmatrix} 1 & 5 \\ 2 & 6 \\ 3 & 7 \\ 4 & 8
\end{pmatrix}
\] just write
##      [,1]  [,2] 
## [1,] "1:4" "5:8"The second optional argument, sep, accepts a character
vector containing regular expressions to use for splitting up the
individual elements within each row/column. By default,
sep = ",". To change the default behavior of separating
individual elements by commas, change the value of sep to
any other valid character. For example, in order to use spaces instead
of commas, write
##      [,1] [,2] [,3] [,4]
## [1,]    1    2    3    4
## [2,]    5    6    7    8To bypass setting these options every time, the user can change them globally with
R functions can be used within the character string as well, but one
must be careful. For example, mat("rnorm(10)") works
because there are no semicolons or commas, hence, the character string
is just parsed and evaluated. However,
mat("rnorm(10, sd = 3)") produces an error because
mat will split the character string on the comma resulting
in two substrings that cannot be parsed:
## [[1]]
## [1] "rnorm(10" " sd = 3)"One particular way around this is to set the sep option
to NULL:
##      [,1]                  
## [1,] "rnorm(5)"            
## [2,] " rnorm(5, mean = 10)"There is often a need to construct matrices from the elements of a list. For example, I tend to store the results of simulations in a list and later want to treat the elements (usually a vector of results) as the rows/columns of a matrix. While this is not difficult to do manually, I frequently have to stop and think for a minute of the best way to accomplish this.
For example, suppose we want to take the list
and convert it into a matrix of the form \[ \begin{pmatrix} 1 & 2 & 3 & ... & 10 \\ 11 & 12 & 13 & ... & 20 \\ 21 & 22 & 23 & ... & 30\end{pmatrix} \] Three approaches come to mind:
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,]    1    2    3    4    5    6    7    8    9    10
## [2,]   11   12   13   14   15   16   17   18   19    20
## [3,]   21   22   23   24   25   26   27   28   29    30##   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## a    1    2    3    4    5    6    7    8    9    10
## b   11   12   13   14   15   16   17   18   19    20
## c   21   22   23   24   25   26   27   28   29    30##   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## a    1    2    3    4    5    6    7    8    9    10
## b   11   12   13   14   15   16   17   18   19    20
## c   21   22   23   24   25   26   27   28   29    30All three approaches succeed in constructing the correct matrix.
Using matrix is simple, but requires the user to flatten
the list first and specify the number of rows (or, equivalently, the
number of columns) ahead of time. Approach two is probably the best, but
novice users are not likely to be familiar with do.call or
the rbind and cbind method of combing vectors
in R. Similarly, in the third approach, new users are not likely to be
familiar with simplify2array, and the user has to take the
transpose of the resulting matrix.
Using mat we can construct the matrix as follows:
##   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## a    1    2    3    4    5    6    7    8    9    10
## b   11   12   13   14   15   16   17   18   19    20
## c   21   22   23   24   25   26   27   28   29    30Notice how the element names are preserved as row names. We could
force the list elements to be columns instead by setting
rows = FALSE. Similar to approach two, mat
uses do.call with rbind and cbind
to construct matrices from lists.
Printing matrices in R can be troublesome. By default, R will try to
print the entire matrix to the screen until it reaches its limit (given
by getOption("max.print") which is usually set to 10000).
Columns will also spill onto multiple rows, if there are enough of them.
It is therefore not very useful to print even moderately large
matrices.
Advanced users typically use the head and
tail functions for viewing, respectively, the first and
last few rows of a matrix or data frame. However, if there are a lot of
columns, they will still spill over onto the next row on the screen. To
see this, try running the following
A new generic function pprint (which stands for pretty
print) is included with the package. S3 methods for objects of class
"matrix" and "data.frame" are also available.
Since its a generic, users can add new methods to suit their needs
(e.g., special printing for lists).
With pprint, large matrices are printed in a much nicer
way. For instance,
## 1000 x 1000 matrix of doubles: 
## 
##               [,1]       [,2]       [,3] ...    [,1000]
## [1,]     0.4678421  0.5993560 -0.8649321 ... -1.2867138
## [2,]     1.3604368 -1.3345235  0.6537629 ...  0.7468100
## [3,]     2.3225528  0.4698787  0.9767913 ... -0.1254003
## ...            ...        ...        ... ...        ...
## [1000,] -0.5507721 -0.1482729  0.5279729 ...  0.6056493produces a \(1000 \times 1000\) matrix of normally distributed random numbers, but only the first few rows and columns, along with the last row and column, of the resulting matrix are shown. The dimension and storage mode are also printed above the matrix.
This printing behavior is also useful for viewing large numeric data
frames. pprint will convert data frames to a matrix via the
base function data.matrix. Consequently, logicals and
factors will be converted to integers before being printed. For data
frames, nicer printing is available via the "tbl_df"
class from the popular dplyr package.
Package ramify also provides two functions similar to
mat, namely, dmat and bmat.
Function dmat works exactly the same as mat
but results in a data frame, rather than a matrix. The bmat
function constructs block matrices using a character string
initializer.
Converting a list to a data frame in R is rather simple. An example is given in the code snippet below.
# List holding individual variables
z1 <- list(
  Height = c(Joe = 6.2, Mary = 5.7, Pete = 6.1),
  Weight = c(Joe = 192.2, Mary = 164.3, Pete = 201.7),
  Gender = c(Joe = 0, Mary = 1, Pete = 0)
)
as.data.frame(z1) # convert z1 to a data frame##      Height Weight Gender
## Joe     6.2  192.2      0
## Mary    5.7  164.3      1
## Pete    6.1  201.7      0When applied to a list, as.data.frame treats the
individual elements (i.e., Height, Weight, and
Gender) as columns. However, it is often the case that list
elements represent records, rather than individual variables. (This is a
common use for Python dictionaries which are naturally imported into R
as a list.) For example, consider the same list, but in a different
format:
# List holding records (i.e., individual observations)
z2 <- list(
  Joe = c(Height = 6.2, Weight = 192.2, Gender = 0),
  Mary = c(Height = 5.7, Weight = 164.3, Gender = 1),
  Pete = c(Height = 6.1, Weight = 201.7, Gender = 0)
)Here, each list element represents an individual record. In order to convert this to a tidy data frame (in a “tidy” data frame, each variable forms a column, and each record forms a row), the necessary code in base R would be
##      Height Weight Gender
## Joe     6.2  192.2      0
## Mary    5.7  164.3      1
## Pete    6.1  201.7      0That is, we first convert it to a data frame, then transpose the
result (which converts the data frame to a "matrix"
object), and then convert it back to a data frame. Using
dmat is much simpler:
##      Height Weight Gender
## Joe     6.2  192.2      0
## Mary    5.7  164.3      1
## Pete    6.1  201.7      0##      Height Weight Gender
## Joe     6.2  192.2      0
## Mary    5.7  164.3      1
## Pete    6.1  201.7      0Suppose we have three matrices \[ A_1 = \begin{pmatrix} 1 & 2 \\ 5 & 6 \end{pmatrix}, \quad A_2 = \begin{pmatrix} 3 & 4 \\ 7 & 8 \end{pmatrix}, \quad A_3 = \begin{pmatrix} 9 & 10 & 11 & 12 \end{pmatrix} \] and want to construct the block matrix defined by \[ A = \begin{pmatrix} A_1 & A_2 \\ A_3 \end{pmatrix} = \begin{pmatrix} 1 & 2 & 3 & 4 \\ 5 & 6 & 7 & 8 \\ 9 & 10 & 11 & 12 \end{pmatrix} \] In base R, we could accomplish this with the following code:
A1 <- matrix(c(1, 2, 5, 6), nrow = 2, byrow = TRUE)
A2 <- matrix(c(3, 4, 7, 8), nrow = 2, byrow = TRUE)
A3 <- matrix(c(9, 10, 11, 12), nrow = 1)
A <- rbind(cbind(A1, A2), A3)This can become rather complicated depending on the structure of the blocks. Specifying the matrices via character strings is more natural and greatly simplifies the task:
This function may be familiar to heavy Python+NumPy users since NumPy
has a similar function also called bmat.
The main functions in this package are mat,
dmat, and bmat and can be viewed as an
extension to matrix; however, a number of convenience
functions are also available, most of which are listed in Table~\(\ref{tab:functions}\). Some of these
functions appear in other R packages as well. Of particular note are the
matlab
and pracma
packages.
| Function(s) | Description | 
|---|---|
| argmax,argmin | Find the position of the maximum or minimum in each row or column of a matrix. | 
| eye | Construct an identity matrix. | 
| hcat,vcat | Concatenate matrices. | 
| fill,ones,zeros,trues,falses | Fill a matrix or array with a particular value. | 
| flatten | Flatten or collapse a matrix into one dimension. | 
| inv | Compute the inverse of a square matrix. | 
| linspace,logspace | Construct a vector of linearly-spaced or logarithmically-spaced elements. | 
| meshgrid | Construct rectangular 2-D grids (useful for plotting). | 
| rand,randi,randn | Construct a matrix or array of pseudorandom numbers. | 
| size,resize | Extract or change the size and shape of a matrix or array. | 
| tr | Compute the trace of a matrix. | 
| tri,tril,triu | Construct or extract lower and upper triangular matrices. | 
Although the functionality listed in Table~\(\ref{tab:functions}\) can be accomplished
in base R (though, not necessarily through a simple function), the
ramify functions are simple and more familiar to most
Julia, MATLAB/Octave, and Python+NumPy users; thus, making the
transition to R easier. linspace and logspace
are two such functions. For example,
linspace(1+2i, 10+10i, 8) creates a vector of complex
numbers with 8 evenly spaced points between 1+2i and
10+10i. In base R, we would use
seq(1+2i, 10+10i, length = 8). There is no base R
equivalent to logspace.
Another example is finding the trace of a matrix. To obtain the trace
of a matrix in base R the user must manually sum the diagonal entries.
Using the ramify function tr is simpler:
## [1] 5.634442## [1] 5.634442The function is called tr, rather than
trace, to avoid conflicts with a base R function called
trace that is used for interactive tracing and
debugging.
Many popular scientific languages also provide convenient ways to
generate vectors and matrices of pseudo-random numbers. Commonly found
functions are rand (for uniform random numbers) and
randn (for normally distributed random numbers). These and
more are available from ramify. We saw basic use of
randn in the section describing the pprint
function. The following snippet of code compares creating a \(100 \times 100 \times 2\) array of \(\mathcal{U}\left(0, 1\right)\) random
deviates in both base R and ramify via the
rand function.
a <- array(runif(20000), c(100, 100, 2)) # base R
a <- rand(100, 100, 2) # ramify
pprint(a[, , 1]) # print the first matrix## 100 x 100 matrix of doubles: 
## 
##             [,1]      [,2]      [,3] ...    [,100]
## [1,]   0.6530737 0.6379830 0.2114788 ... 0.4139361
## [2,]   0.9459583 0.9128399 0.2300934 ... 0.4334866
## [3,]   0.2319393 0.9550313 0.2696696 ... 0.5444542
## ...          ...       ...       ... ...       ...
## [100,] 0.3558624 0.9739206 0.1575685 ... 0.7652998Of course, the base R approach is more flexible as it allows you to
use any one of the built-in distributions (e.g., binomial), however,
generating uniform (continuous or discrete) and normal random variates
is far more common, hence, the popularity of the rand,
randi, and randn functions often seen in other
languages.
As seen in the previous example, many of the functions listed in Table~\(\ref{tab:functions}\) can be used to construct vectors, matrices, or multi-way arrays just by specifying the extra dimensions. For example,
##       [,1]
##  [1,]    0
##  [2,]    0
##  [3,]    0
##  [4,]    0
##  [5,]    0
##  [6,]    0
##  [7,]    0
##  [8,]    0
##  [9,]    0
## [10,]    0##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,]    1    1    1    1    1    1    1    1    1     1
##  [2,]    1    1    1    1    1    1    1    1    1     1
##  [3,]    1    1    1    1    1    1    1    1    1     1
##  [4,]    1    1    1    1    1    1    1    1    1     1
##  [5,]    1    1    1    1    1    1    1    1    1     1
##  [6,]    1    1    1    1    1    1    1    1    1     1
##  [7,]    1    1    1    1    1    1    1    1    1     1
##  [8,]    1    1    1    1    1    1    1    1    1     1
##  [9,]    1    1    1    1    1    1    1    1    1     1
## [10,]    1    1    1    1    1    1    1    1    1     1Note that these functions, by default, always return a
"matrix" or "array" object meaning there is
always a "dim" attribute. In contrast, in base R, vectors
do not have a "dim" attribute. So, when creating a vector
using zeros(10), for example, the result will be a matrix
of zeros with ten rows and one column. To bypass this behavior, you can
use the atleast_2d option:
##  [1] 0 0 0 0 0 0 0 0 0 0This behavior can also be changed globally using
options(atleast_2d = FALSE).
As a last example, we shal demonstrate the meshgrid
function. A meshgrid function is available in MATLAB/Octave
and Python+NumPy and is most frequently used to produce input for a 2-D
or 3-D function that will be plotted. It should be noted, however, that
the R version of meshgrid provided by ramify
returns a list of matrices. The following code, for example, plots
contours of the function \[
  y = \cos{\left(x_1^2 + x_2^2\right)} \times
\exp\left(-\frac{\sqrt{x_1^2 + x_2^2}}{6}\right)
\] over the Cartesian grid \(\left[-4\pi, 4\pi\right] \times \left[-4\pi,
4\pi\right]\). The resulting plot is created as follows.
x <- meshgrid(linspace(-4 * pi, 4 * pi, 27)) # list of input matrices
y <- cos(x[[1]]^2 + x[[2]]^2) * exp(-sqrt(x[[1]]^2 + x[[2]]^2) / 6)
par(mar = c(0, 0, 0, 0)) # remove margins
image(y, axes = FALSE) # color image
contour(y, add = TRUE, drawlabels = FALSE) # add contour linesMatrices are a fundamental feature of any scientific language. This
vignette introduced the simple R package ramify, which
provides additional matrix functionality in R. Using
ramify, I showed how to craft matrices using: (i) character
strings and (ii) lists. A similar construction of block matrices and
data frames was also briefly discussed. I also provided a quick summary
of a number of convenience functions for easing the transition to R from
other popular scientific languages such as Julia, MATLAB/Octave, and
Python+NumPy.