Read and write csv tables annotated with metadata according to the “CSV on the Web” standard (CSVW).
The csvw model for tabular data describes how to annotate a group of csv tables to ensure they are interpreted correctly.
This package uses the csvw metadata schema to find tables, identify column names and cast values to the correct types.
The aim is to reduce the amount of manual work needed to parse and prepare data before it can be used in analysis.
You can use csvwr
to read a csv table with json
annotations into a data frame:
library(csvwr)
# Parse a csv table using json metadata :
<- read_csvw("data.csv", "metadata.json")
csvw
# To extract the parsed table (with syntactic variable names and typed-columns):
$tables[[1]]$dataframe csvw
Alternatively, you can jump straight to the parsed table in one call:
read_csvw_dataframe("data.csv", "metadata.json")
You can also prepare annotations for a data frame:
# Given a data frame (saved as a csv)
<- data.frame(x=c("a","b","c"), y=1:3)
d write.csv(d, "table.csv", row.names=FALSE)
# Derive a schema
<- derive_table_schema(d)
s
# Create metadata (as a list)
<- create_metadata(tables=list(list(url="table.csv", tableSchema=s)))
m
# Serialise the metadata to JSON
<- jsonlite::toJSON(m)
j
# Write the json to a file
cat(j, file="metadata.json")
For a complete introduction to the library please see the
vignette("read-write-csvw")
.
You can install the latest release from CRAN:
install.packages("csvwr")
Or for the development version you can use devtools to install
csvwr
from GitHub:
install.packages("devtools")
::install_github("Robsteranium/csvwr") devtools
Broadly speaking, the objectives are as follows:
It’s not an urgent objective for the library to perform csv2rdf or csv2json translation although some support for csv2json is provided as this is used to test that the parsing is done correctly.
In terms of the csvw test cases provided by the standard, the following areas need to be addressed (in rough priority order):
readr::read_csv
(and indeed
utils::read.csv
) accepts URIs, but the spec also involves
link, dialect, and content-type headers)The project currently incorporates two main parts of the csvw test suite:
In each case, we’re running only that subset of test entries that can be expected to pass given that part of the standard that has thus far been implemented. Some entries will be skipped (either permanently or) while other priorities are implemented.
You can find out what needs to be implemented next by widening the subset to include the next entry.
During development, you may find it convenient to recreate one of the test entries for exploration. There is a convenience function in tests/csvw-tests-helpers.R. This isn’t exported by the package so you’ll need to evaluate it explicitly. You can then use it as follows:
run_entry_in_dev(16) # index number in the list of entries
run_entry_in_dev(id="manifest-json#test023") # identifier for the test
There are also some more in-depth unit tests written for this library.
We use GitHub actions to test the package against multiple
architectures and the current, previous and development versions of R.
If you need to test against the R-devel locally then you can use the
r-devel.Dockerfile
:
docker build -f r-devel.Dockerfile . --tag csvw-devel
docker run --rm "csvw-devel"
You can use devtools::load_all()
(CTRL + SHIFT + L
in RStudio) to load updates and
testthat::test_local()
(CTRL + SHIFT + T
) to
run the tests.
In order to check the vignettes, you need to do
devtools::install(build_vignettes=T)
. Then you can open
e.g. vignette("read-write-csvw")
.
GPL-3
To discuss other licensing terms, please get in contact.
There’s another R implementation of csvw in the package rcsvw.
If you’re interested in csvw more generally, then the RDF::Tabular ruby gem provides one of the more robust and comprehensive implementations, supporting both translation and validation.
If you’re specifically interested in validation, take a look at the ODI’s csvlint which implements csvw and also the OKFN’s frictionless data table schemas.
If you want rdf translation, then you might like to check out Swirrl’s csv2rdf and also table2qb which generates csvw annotations from csv files to describe RDF Data Cubes.