The Robust re-scaling transformation (RR) is a transformation the help reveal latent structure in data. It uses three steps to transform the data:
The sequence of these transformations helps focus classic statistical analyses on consequential variance in the data rather than having the analyses be dominated by variation resulting from measurement scale or outliers.
If you have not already read the basic vignette “Rescaling Data” that is recommend first.
Typically, the input to RR is a matrix or data.frame, the output is a matrix or data.frame of the same size, but with re-scaled values. However, in this vignette we will explore how RR scale may also be used for ragged matrices, data frames, or lists.
First let’s create some ragged data. We will generate data that cannot be put into a matrix since each of the observations is of different length:
## List of 10
## $ : num [1:22] 0.3971 0.0881 1.7222 5.3849 1.888 ...
## $ : num [1:23] 0.152 0.104 0.35 1.185 1.036 ...
## $ : num [1:19] 1.284 0.871 0.235 0.361 0.275 ...
## $ : num [1:17] 0.6 0.129 0.549 2.843 0.344 ...
## $ : num [1:30] 0.591 1.101 2.055 0.57 0.366 ...
## $ : num [1:22] 0.112 0.303 0.877 0.391 1.204 ...
## $ : num [1:18] 0.3513 0.2334 0.5813 2.102 0.0117 ...
## $ : num [1:15] 2.646 3.692 0.797 1.338 2.147 ...
## $ : num [1:17] 0.417 0.307 0.155 0.357 2.066 ...
## $ : num [1:21] 1.656 0.473 0.35 2.121 0.459 ...
We can still pass this to RR and have it transformed
notice that the output of rrscale takes the same form as the input data. In this case it is a list of 10 sets of numbers:
## List of 10
## $ : num [1:22] -0.388 -1.383 0.996 2.46 1.099 ...
## $ : num [1:23] -1.064 -1.293 -0.485 0.596 0.461 ...
## $ : num [1:19] 0.679 0.293 -0.777 -0.461 -0.665 ...
## $ : num [1:17] -0.0454 -1.1622 -0.1232 1.5906 -0.499 ...
## $ : num [1:30] -0.0587 0.5217 1.197 -0.091 -0.4509 ...
## $ : num [1:22] -1.247 -0.595 0.299 -0.4 0.612 ...
## $ : num [1:18] -0.4827 -0.7804 -0.0734 1.2237 -2.2642 ...
## $ : num [1:15] 1.501 1.93 0.209 0.722 1.248 ...
## $ : num [1:17] -0.35 -0.584 -1.052 -0.471 1.204 ...
## $ : num [1:21] 0.952 -0.246 -0.486 1.234 -0.271 ...
We can compare the untransformed and the transformed data:
library('ggplot2')
library('reshape2')
par(mfrow=c(2,1))
df = data.frame(untrans=unlist(data),rr=unlist(rr.out$RR))
df = melt(df,measure.vars = 1:2)
ggplot(data=df,mapping=aes(x=value,fill=variable))+geom_histogram()+facet_wrap(~variable)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
We can also still use the transformation function to transform data as previously. For example, if we only want to apply the “G”-step, we can call:
## List of 10
## $ : num [1:22] -0.829 -1.843 0.58 2.072 0.686 ...
## $ : num [1:23] -1.5182 -1.7509 -0.9277 0.1731 0.0359 ...
## $ : num [1:19] 0.258 -0.136 -1.226 -0.904 -1.111 ...
## $ : num [1:17] -0.48 -1.618 -0.559 1.187 -0.942 ...
## $ : num [1:30] -0.4938 0.0976 0.7856 -0.5267 -0.8934 ...
## $ : num [1:22] -1.704 -1.04 -0.129 -0.841 0.19 ...
## $ : num [1:18] -0.926 -1.229 -0.509 0.813 -2.741 ...
## $ : num [1:15] 1.095 1.533 -0.221 0.301 0.838 ...
## $ : num [1:17] -0.79 -1.029 -1.505 -0.914 0.792 ...
## $ : num [1:21] 0.536 -0.685 -0.929 0.823 -0.71 ...