Preparation of the example data

The example data used in this package was originally published by Yamanishi et al, 2008. They used the KEGG data base to get information drug-target interaction for different groups of enzymes. We used their supplementary material as a basis for the example data provided to the package. Their supplementary datasets can be downloaded from here.

Processing the drug similarities

In the original paper the authors relied on the SIMCOMP algorithm, but this method returns non-symmetric matrices and hence the original data cannot be used in a meaningful way for a two-step kernel ridge regression. Hence we decided to recreate the similarities between the different drugs, this time using the algorithms provided in the fmcsR package v1.20.0. The code used to obtain and process the drug similarities is heavily based on code kindly provided by Dr. Thomas Girke on the BioConductor support forum.

Obtaining the data

To read in the structural data for all compounds we create a function that constructs the actual link and retrieves the data from KEGG. This function is based on the tools provided in the ChemmineR package v2.30.2:

library(ChemmineR)
importKEGG <- function(ids){
  sdfset <- SDFset() # creates an empty SDF set
  
  # We use the link format for obtaining the data
  urlp <- "http://www.genome.jp/dbget-bin/www_bget?-f+m+drug+"
  
  # Combine everything in an sdfset
  for(i in ids){
    url <- paste0(urlp, i)
    tmp <- as(read.SDFset(url), "SDFset")
    cid(tmp) <- i
    sdfset <- c(sdfset, tmp)
  }
  return(sdfset)
}
# Now read the SDF information for all compounds in the research
keggsdf <- importKEGG(colnames(drugTargetInteraction))

Calculating the similarities

The fmcs function in the fmcsR package allows to compute a similarity score between two compounds. It returns a few different similarity measures, including the Tanimoto coefficient. This coefficient turns out to be a valid kernel for chemical similarities (Ralaivola et al, 2005 , Bajusz et al, 2015). So in this example we continue with the Tanimoto coefficients.

# Keep in mind this needs some time to run! 
drugSim <- sapply(cid(keggsdf),
                  function(x){
                    fmcsBatch(keggsdf[x], keggsdf,
                              au = 0, bu = 0)[,"Tanimoto_Coefficient"]
                  })

All data is stored in the package and can be accessed using

data(drugtarget)
#> Warning in data(drugtarget): data set 'drugtarget' not found

Preparation of the example data

Joris Meys

2020-02-03

Obtaining the original data

Processing the drug similarities

Obtaining the data

Calculating the similarities