LBDiscover is an R package for literature-based discovery (LBD) in biomedical research. It provides a comprehensive suite of tools for retrieving scientific articles, extracting biomedical entities, building co-occurrence networks, and applying various discovery models to uncover hidden connections in the scientific literature.
The package implements several literature-based discovery approaches including:
LBDiscover also features powerful visualization tools for exploring discovered connections using networks, heatmaps, and interactive diagrams.
# Install from CRAN
install.packages("LBDiscover")
# Or install the development version from GitHub
# install.packages("devtools")
::install_github("chaoliu-cl/LBDiscover") devtools
LBDiscover provides a complete workflow for literature-based discovery:
library(LBDiscover)
# Retrieve articles from PubMed
<- pubmed_search("migraine treatment", max_results = 100)
articles
# Preprocess article text
<- vec_preprocess(
preprocessed
articles,text_column = "abstract",
remove_stopwords = TRUE
)
# Extract biomedical entities
<- extract_entities_workflow(
entities
preprocessed,text_column = "abstract",
entity_types = c("disease", "drug", "gene")
)
# Create co-occurrence matrix
<- create_comat(
co_matrix
entities,doc_id_col = "doc_id",
entity_col = "entity",
type_col = "entity_type"
)
# Apply the ABC model to find new connections
<- abc_model(
abc_results
co_matrix,a_term = "migraine",
n_results = 50,
scoring_method = "combined"
)
# Visualize the results
vis_abc_network(abc_results, top_n = 20)
The ABC model is based on Swanson’s discovery paradigm. If concept A is related to concept B, and concept B is related to concept C, but A and C are not directly connected in the literature, then A may have a hidden relationship with C.
# Apply the ABC model
<- abc_model(
abc_results
co_matrix,a_term = "migraine",
min_score = 0.1,
n_results = 50
)
# Visualize as a network
vis_abc_network(abc_results)
# Or as a heatmap
vis_heatmap(abc_results)
The AnC model is an extension of the ABC model that uses multiple B terms to establish stronger connections between A and C.
# Apply the AnC model
<- anc_model(
anc_results
co_matrix,a_term = "migraine",
n_b_terms = 5,
min_score = 0.1
)
The Latent Semantic Indexing model identifies semantically related terms using dimensionality reduction techniques.
# Create term-document matrix
<- create_term_document_matrix(preprocessed)
tdm
# Apply LSI model
<- lsi_model(
lsi_results
tdm,a_term = "migraine",
n_factors = 100
)
The package offers multiple visualization options:
# Network visualization
vis_abc_network(abc_results, top_n = 25)
# Heatmap of connections
vis_heatmap(abc_results, top_n = 20)
# Export interactive HTML network
export_network(abc_results, output_file = "abc_network.html")
# Export interactive chord diagram
export_chord(abc_results, output_file = "abc_chord.html")
For an end-to-end analysis:
# Run comprehensive discovery analysis
<- run_lbd(
discovery_results search_query = "migraine pathophysiology",
a_term = "migraine",
discovery_approaches = c("abc", "anc", "lsi"),
include_visualizations = TRUE,
output_file = "discovery_report.html"
)
For more detailed documentation and examples, please see the package vignettes:
# View package vignettes
browseVignettes("LBDiscover")
If you use LBDiscover in your research, please cite:
Liu, C. (2025). LBDiscover: Literature-Based Discovery Tools for Biomedical Research.
R package version 0.1.0. https://github.com/chaoliu-cl/LBDiscover
This project is licensed under the GPL-3 License - see the LICENSE file for details.