library(tensorMiss)
We go through some basic functions related to tensor in ‘tensorMiss’ in this vignette. First, we start with constructing a tensor, which is by using the base class ‘array’ in R. The following tensor is an order-\(3\) tensor with dimensions \(3\times 4\times 2\).
<- array(1:24, dim=c(3,4,2)) example
As we can see, subsetting and truncating in multi-dimensional arrays are trivial, and we display them below by an example to inject missingness inside the tensor.
3,1,1] <- NA
example[print(example)
#> , , 1
#>
#> [,1] [,2] [,3] [,4]
#> [1,] 1 4 7 10
#> [2,] 2 5 8 11
#> [3,] NA 6 9 12
#>
#> , , 2
#>
#> [,1] [,2] [,3] [,4]
#> [1,] 13 16 19 22
#> [2,] 14 17 20 23
#> [3,] 15 18 21 24
We now quickly go through the basic unfolding and refolding functions as an example.
.1 <- unfold(example, 1)
exampleprint(example.1)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 1 4 7 10 13 16 19 22
#> [2,] 2 5 8 11 14 17 20 23
#> [3,] NA 6 9 12 15 18 21 24
Without doubt, the refolding of an unfolding matrix returns back to the original tensor, given the correct dimension.
refold(example.1, 1, dim(example))==example
#> , , 1
#>
#> [,1] [,2] [,3] [,4]
#> [1,] TRUE TRUE TRUE TRUE
#> [2,] TRUE TRUE TRUE TRUE
#> [3,] NA TRUE TRUE TRUE
#>
#> , , 2
#>
#> [,1] [,2] [,3] [,4]
#> [1,] TRUE TRUE TRUE TRUE
#> [2,] TRUE TRUE TRUE TRUE
#> [3,] TRUE TRUE TRUE TRUE
Lastly, k-mode matrix product is performed in the following. See Kolda and Bader (2009) for more details on tensor data.
ttm(example, matrix(1:4,nrow=2), 3)
#> , , 1
#>
#> [,1] [,2] [,3] [,4]
#> [1,] 40 52 64 76
#> [2,] 44 56 68 80
#> [3,] NA 60 72 84
#>
#> , , 2
#>
#> [,1] [,2] [,3] [,4]
#> [1,] 54 72 90 108
#> [2,] 60 78 96 114
#> [3,] NA 84 102 120
A factor-based imputation method is proposed by Cen and Lam (2024) on tensor time series. The ‘tensor_gen’ function initialises a zero-mean mode-\(K\) tensor time series with factor structure, so that at each time \(t\in\{1,2,\dots,T\}\),
\[
\mathcal{Y}_t = \mathcal{F}_t \times_1 \mathbf{A}_1 \times_2
\mathbf{A}_2 \times_3 \cdots \times_K \mathbf{A}_K + \mathcal{E}_t ,
\] where \(\mathcal{Y}_t\in\mathbb{R}^{d_1\times
d_2\times\cdots\times d_K}\) is the generated order-\(K\) tensor data, \(\mathcal{F}_t\in\mathbb{R}^{r_1\times
r_2\times\cdots\times r_K}\) is the (possibly weak) core factor
series, each \(\mathbf{A}_k\in\mathbb{R}^{d_k\times r_k}\)
for \(k\in\{1,\dots,K\}\) is the
mode-\(k\) factor loading matrix, and
\(\mathcal{E}_t\) is the error series
with the same dimension as \(\mathcal{Y}_t\). Weak cross-sectional and
serial correlations are allowed in the fibres of the error series. See
Cen and Lam (2024) for the details.
The purpose of the imputation given a tensor time series with
missingness is to estimate/impute the missing entries. First, the data
can be initialised by the ‘tensor_gen’ function. For reproducibility, a
seed parameter is required, which is 2023 by default.
<- 3 #order 3 at each time
K <- 20 #time length
TT <- c(30,30,30) #spatial dimensions
d <- c(2,2,2) #rank of core tensor
r <- c(2,2,2) #rank of common component in error
re <- list(c(0,0), c(0,0), c(0,0)) #strong factors
eta <- c(0.7, 0.3, -0.4, 0.2, -0.1) #AR(5) coefficients for core factor
coef_f <- c(-0.7, -0.3, -0.4, 0.2, 0.1) #AR(5) coefficients for common component in error
coef_fe <- c(0.8, 0.4, -0.4, 0.2, -0.1) #AR(5) coefficients for idiosyncratic part in error
coef_e <- tensor_gen(K,TT,d,r,re,eta, coef_f, coef_fe, coef_e) data_test
Missing entries are represented by ‘NA’s in the data. For example, each entry is randomly missing with probability 0.3 using the ’miss_gen’ function. More missing patterns are available with the function.
<- miss_gen(data_test$X) data_miss
With the ‘miss_factor_est’ function, the factor structure can be
estimated in one go. The number of factors could be either provided or
not provided, in the latter case the function estimates the number of
factors using the eigenvalue-ratio-based estimator. For the details of
estimation, see Cen and Lam (2024).
As an example, the factor loading error measured by column space
distance are computed using the ‘fle’ function.
<- miss_factor_est(data_miss, r)
est_result fle(est_result$A[[1]], data_test$A[[1]])
#> [1] 0.01580778
Lastly, we can gauge the imputation performance using relative MSE (rMSE) or even the introduced q-MSE. Setting q as the length of the input vector would essentially output the relative MSE. Examples of rMSE and q-MSE with q as 100 are demonstrated.
qMSE(c(data_test$C), c(est_result$imputation), length(c(data_test$C))) #rMSE
#> [1] 0.0006705599
qMSE(c(data_test$C), c(est_result$imputation), 100) #q-MSE
#> [1] 5.78442e-06
Under certain conditions (Cen and Lam 2024), the residue between the row of the estimated loading matrix and its corresponding true row under some rotation can be shown to be asymptotically normal. A consistent covariance matrix estimator can be computed by the ‘sigmaD’ function. For instance, to compute the residue on the first row of the estimated mode-2 loading matrix, the covariance matrix estimator is obtained below. The rotation matrix is also computed afterwards.
# computing the covariance matrix estimator
<- r[2]
r2 <- data_test$A[[2]]
A2 <- floor(0.2*(TT^0.25 * (d[2])^0.25))
beta <- diag(x=(svd(est_result$covMatrix[[2]])$d)[1:r2], nrow=r2, ncol=r2)
D2 # HAC_cov: HAC-type covariance matrix estimator
<- sigmaD(2, D2, est_result$A[[2]], est_result$imputation, data_miss, 1, beta)
HAC_cov
# computing the rotation matrix
<- data_test$A[[3]] %x% data_test$A[[1]]
Amk <- 0
R_ast for (t in 1:TT){
<- R_ast + unfold(data_test$Ft[t,,,],2) %*% t(Amk) %*% Amk %*% t(unfold(data_test$Ft[t,,,],2))
R_ast
}<- A2 %*% R_ast %*% t(A2)
R_ast <- R_ast/TT
R_ast <- diag(x = diag(t(A2) %*% A2), nrow=r2, ncol=r2)
Z2 <- A2 %*% diag(x=diag(solve(Z2))^0.5, nrow=r2, ncol=r2)
Q2 # H2: rotation matrix
<- solve(D2) %*% t(est_result$A[[2]]) %*% R_ast %*% Q2 %*% solve(t(Q2)%*% Q2) H2
Eventually, the standardised residue is shown below and should follow a standard normal distribution when dimensions are increased.
<- eigen(HAC_cov)
HAC_cov.eigen <- HAC_cov.eigen$vectors %*% diag(sqrt(HAC_cov.eigen$values)) %*% solve(HAC_cov.eigen$vectors)
HAC_cov.sqrt <- (solve(HAC_cov.sqrt) %*% D2) %*% (matrix(est_result$A[[2]], nrow=d[2], ncol=r2)[1,] - (H2 %*% Q2[1,]))
A2_1
A2_1#> [,1]
#> [1,] 3.482375
#> [2,] -0.951442