A-short-introduction-to-tensorMiss

library(tensorMiss)

1 Quick start: unfolding, refolding, and k-mode matrix product

We go through some basic functions related to tensor in ‘tensorMiss’ in this vignette. First, we start with constructing a tensor, which is by using the base class ‘array’ in R. The following tensor is an order-\(3\) tensor with dimensions \(3\times 4\times 2\).

example <- array(1:24, dim=c(3,4,2))

As we can see, subsetting and truncating in multi-dimensional arrays are trivial, and we display them below by an example to inject missingness inside the tensor.

example[3,1,1] <- NA
print(example)
#> , , 1
#> 
#>      [,1] [,2] [,3] [,4]
#> [1,]    1    4    7   10
#> [2,]    2    5    8   11
#> [3,]   NA    6    9   12
#> 
#> , , 2
#> 
#>      [,1] [,2] [,3] [,4]
#> [1,]   13   16   19   22
#> [2,]   14   17   20   23
#> [3,]   15   18   21   24

We now quickly go through the basic unfolding and refolding functions as an example.

example.1 <- unfold(example, 1)
print(example.1)
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,]    1    4    7   10   13   16   19   22
#> [2,]    2    5    8   11   14   17   20   23
#> [3,]   NA    6    9   12   15   18   21   24

Without doubt, the refolding of an unfolding matrix returns back to the original tensor, given the correct dimension.

refold(example.1, 1, dim(example))==example
#> , , 1
#> 
#>      [,1] [,2] [,3] [,4]
#> [1,] TRUE TRUE TRUE TRUE
#> [2,] TRUE TRUE TRUE TRUE
#> [3,]   NA TRUE TRUE TRUE
#> 
#> , , 2
#> 
#>      [,1] [,2] [,3] [,4]
#> [1,] TRUE TRUE TRUE TRUE
#> [2,] TRUE TRUE TRUE TRUE
#> [3,] TRUE TRUE TRUE TRUE

Lastly, k-mode matrix product is performed in the following. See Kolda and Bader (2009) for more details on tensor data.

ttm(example, matrix(1:4,nrow=2), 3)
#> , , 1
#> 
#>      [,1] [,2] [,3] [,4]
#> [1,]   40   52   64   76
#> [2,]   44   56   68   80
#> [3,]   NA   60   72   84
#> 
#> , , 2
#> 
#>      [,1] [,2] [,3] [,4]
#> [1,]   54   72   90  108
#> [2,]   60   78   96  114
#> [3,]   NA   84  102  120

2 Missing value imputation for tensor factor models

A factor-based imputation method is proposed by Cen and Lam (2024) on tensor time series. The ‘tensor_gen’ function initialises a zero-mean mode-\(K\) tensor time series with factor structure, so that at each time \(t\in\{1,2,\dots,T\}\),

\[ \mathcal{Y}_t = \mathcal{F}_t \times_1 \mathbf{A}_1 \times_2 \mathbf{A}_2 \times_3 \cdots \times_K \mathbf{A}_K + \mathcal{E}_t , \] where \(\mathcal{Y}_t\in\mathbb{R}^{d_1\times d_2\times\cdots\times d_K}\) is the generated order-\(K\) tensor data, \(\mathcal{F}_t\in\mathbb{R}^{r_1\times r_2\times\cdots\times r_K}\) is the (possibly weak) core factor series, each \(\mathbf{A}_k\in\mathbb{R}^{d_k\times r_k}\) for \(k\in\{1,\dots,K\}\) is the mode-\(k\) factor loading matrix, and \(\mathcal{E}_t\) is the error series with the same dimension as \(\mathcal{Y}_t\). Weak cross-sectional and serial correlations are allowed in the fibres of the error series. See Cen and Lam (2024) for the details.

The purpose of the imputation given a tensor time series with missingness is to estimate/impute the missing entries. First, the data can be initialised by the ‘tensor_gen’ function. For reproducibility, a seed parameter is required, which is 2023 by default.

K <- 3 #order 3 at each time
TT <- 20 #time length
d <- c(30,30,30) #spatial dimensions
r <- c(2,2,2) #rank of core tensor
re <- c(2,2,2) #rank of common component in error
eta <- list(c(0,0), c(0,0), c(0,0)) #strong factors
coef_f <- c(0.7, 0.3, -0.4, 0.2, -0.1) #AR(5) coefficients for core factor
coef_fe <- c(-0.7, -0.3, -0.4, 0.2, 0.1) #AR(5) coefficients for common component in error
coef_e <- c(0.8, 0.4, -0.4, 0.2, -0.1) #AR(5) coefficients for idiosyncratic part in error
data_test <- tensor_gen(K,TT,d,r,re,eta, coef_f, coef_fe, coef_e)

Missing entries are represented by ‘NA’s in the data. For example, each entry is randomly missing with probability 0.3 using the ’miss_gen’ function. More missing patterns are available with the function.

data_miss <- miss_gen(data_test$X)

With the ‘miss_factor_est’ function, the factor structure can be estimated in one go. The number of factors could be either provided or not provided, in the latter case the function estimates the number of factors using the eigenvalue-ratio-based estimator. For the details of estimation, see Cen and Lam (2024).

As an example, the factor loading error measured by column space distance are computed using the ‘fle’ function.

est_result <- miss_factor_est(data_miss, r)
fle(est_result$A[[1]], data_test$A[[1]])
#> [1] 0.01580778

Lastly, we can gauge the imputation performance using relative MSE (rMSE) or even the introduced q-MSE. Setting q as the length of the input vector would essentially output the relative MSE. Examples of rMSE and q-MSE with q as 100 are demonstrated.

qMSE(c(data_test$C), c(est_result$imputation), length(c(data_test$C))) #rMSE
#> [1] 0.0006705599
qMSE(c(data_test$C), c(est_result$imputation), 100) #q-MSE
#> [1] 5.78442e-06

3 Asymptotic normality of estimated loading matrix rows

Under certain conditions (Cen and Lam 2024), the residue between the row of the estimated loading matrix and its corresponding true row under some rotation can be shown to be asymptotically normal. A consistent covariance matrix estimator can be computed by the ‘sigmaD’ function. For instance, to compute the residue on the first row of the estimated mode-2 loading matrix, the covariance matrix estimator is obtained below. The rotation matrix is also computed afterwards.

# computing the covariance matrix estimator
r2 <- r[2]
A2 <- data_test$A[[2]]
beta <- floor(0.2*(TT^0.25 * (d[2])^0.25))
D2 <- diag(x=(svd(est_result$covMatrix[[2]])$d)[1:r2], nrow=r2, ncol=r2)
# HAC_cov: HAC-type covariance matrix estimator
HAC_cov <- sigmaD(2, D2, est_result$A[[2]], est_result$imputation, data_miss, 1, beta)

# computing the rotation matrix
Amk <- data_test$A[[3]] %x% data_test$A[[1]]
R_ast <- 0
for (t in 1:TT){
  R_ast <- R_ast + unfold(data_test$Ft[t,,,],2) %*% t(Amk) %*% Amk %*% t(unfold(data_test$Ft[t,,,],2))
}
R_ast <- A2 %*% R_ast %*% t(A2)
R_ast <- R_ast/TT
Z2 <- diag(x = diag(t(A2) %*% A2), nrow=r2, ncol=r2)
Q2 <- A2 %*% diag(x=diag(solve(Z2))^0.5, nrow=r2, ncol=r2)
# H2: rotation matrix
H2 <- solve(D2) %*% t(est_result$A[[2]]) %*% R_ast %*% Q2 %*% solve(t(Q2)%*% Q2)

Eventually, the standardised residue is shown below and should follow a standard normal distribution when dimensions are increased.

HAC_cov.eigen <- eigen(HAC_cov)
HAC_cov.sqrt <- HAC_cov.eigen$vectors %*% diag(sqrt(HAC_cov.eigen$values)) %*% solve(HAC_cov.eigen$vectors)
A2_1 <- (solve(HAC_cov.sqrt) %*% D2) %*% (matrix(est_result$A[[2]], nrow=d[2], ncol=r2)[1,] - (H2 %*% Q2[1,]))
A2_1
#>           [,1]
#> [1,]  3.482375
#> [2,] -0.951442

References

Cen, Zetai, and Clifford Lam. 2024. “Tensor Time Series Imputation Through Tensor Factor Modelling.”
Kolda, Tamara G., and Brett W. Bader. 2009. “Tensor Decompositions and Applications.” SIAM Review 51 (3): 455–500.