read_bim
,
read_fam
, read_ind
, and read_snp
functions.write_bed
written in Rcpp and
thoroughly tested against BEDMatrix
package.write_bed
error message for invalid data,
documentation.write_bed
tests.write_fam
, write_bim
,
write_ind
, write_snp
functions.read_*
code, updated docs and tests.make_fam
, make_bim
, and
write_plink
functions.read_fam
bug (used to require phenotypes to be
integers, now can be double numbers).verbose
option to write_bed
.write_plink
now returns NULL
invisibly.require_files_plink
,
delete_files_plink
.ind_to_fam
, sex_to_int
,
sex_to_char
.read_bed
and read_plink
! Now all
Plink reading and writing operations are supported.BEDMatrix
, snpStats
, and
lfa
.read_plink
now includes row and column names
automatically.read_bed
accepts either row and column names or just
their numbers.write_plink
checks these row and column names against
the BIM and FAM tables for consistency, if these are all present.BEDMatrix
in testing, since it
leaves temporary files open and on Windows they do not get deleted and
leave confusing error messages behind.include <cerrno>
to my cpp code.read_phen
and write_phen
, a
phenotype format (very similar to Plink’s FAM) used by GCTA and
EMMAX.write_plink
returns the data it wrote, invisibly as
a list. Most useful for auto-generated data.man/figures/
tidy_kinship
to transform a square symmetric
matrix into a long-format table that is easy to sort and add annotations
toread_grm
and write_grm
to read and
write GCTA’s binary genetic relatedness matrix (GRM) format.require_files_grm
,
delete_files_grm
, require_files_phen
, and
delete_files_phen
.validate_tab_generic
.write_plink
, write_bed
, and
write_bim
now have append
option, for writing
extremely large files in parts.write_eigenvec
and read_eigenvec
to
read and write Plink/GCTA eigenvector files.count_lines
, uses C++ code (via Rcpp) to count
file lines extremely quickly. Intended for counting numbers of
individuals (from FAM and equivalent files) or numbers of loci (from BIM
and equivalent files) when these files are extremely large and no other
information is needed from those files.read_eigenvec
added Plink 2 support via
comment
option, which by default now treats data after
#
as comments. This enables automatically parsing eigenvec
files generated by Plink 2, whose header line starts with #
(this header is ignored). Previously, parsing Plink 2 eigenvec files
generated warnings and resulted in the first row being an additional row
with all NA
values.read_bed
added a missing file check in R code.
lfa
comparison.
lfa
fork doesn’t have function
read.bed
anymore, previously the slowest and most
memory-hungry competitor, which genio::read_plink
was being
compared to.genio
package docgeno_to_char
to convert genotype numeric
codes (allele dosages such as 0, 1, 2) into character codes such as
‘A/A’, ‘A/G’, ‘G/G’ (depending on locus).read_matrix
and
write_matrix
, intended for admixture inference data.read_bed
, which previously
incorrectly stated that the numerical genotypes (allele dosages) counted
alternative alleles (allele 2 in BIM table), whereas the truth is that
they count reference alleles (allele 1).count_lines
now returns value as integer
instead of double (a very minor bug/annoyance fix).lfa
from suggested packages (no connection
anymore since lfa
comparison was removed from vignette in
version 1.0.19.9000).read_bed
now reads file
even if
it doesn’t have a BED extension (as long as it exists).
ext
option.read_*
functions to
clarify behavior regarding file
and ext
options.real_path
to
add_ext_read
to make the distinction clearer to
add_ext
.read_*
functions use
add_ext_read
while all write_*
functions use
add_ext
. Only function count_lines
switched
from add_ext
to add_ext_read
(in addition to
read_bed
, which led to the earlier change), but
count_lines
didn’t have a default extension so this change
is less likely to matter.NEWS.md
slightly to improve its
automatic parsing.read_bed
and read_plink
no
longer stop with an error if the input BED file has non-zero padding
bits.
plink2
binary and the
BEDMatrix
R package load this file without complaining
about the non-zero pads, so I decided to agree in that behavior. I
verified that genio
’s data agrees with
BEDMatrix
after the fix.write_bed/plink
with append = TRUE
debugged to write in “binary” mode.
append
option was introduced
in 1.0.15.9000 (2020-07-03).readr::read_table2
with
readr::read_table
read_table2()
was deprecated in
readr 2.0.0. Please use read_table()
instead.
readr
(>= 2.0.0, already on CRAN).pryr::object_size
with
lobstr::obj_size
(a suggested package used in vignette
only; the former was recently superseded by the latter)
pryr::object_size
output (now of class
lobstr_bytes
), which triggered a CRAN warning.read_eigenvec
fixed this warning:
value
argument of names<-
must be
a character vector as of tibble 3.0.0.”write_bed
, write_plink
, and
count_lines
fixed a bug: write (or read) failed if output
path started with “~/” on Unix systems. Problem was the path wasn’t
expanded in C++ code.
For example, write_plink( '~/test', X )
failed with
message:
Writing: ~/test.bed
Error in write_bed_cpp(file, X, append = append) :
Could not open BED file `~/test.bed` for writing: No such file or directory
Calls: write_plink -> write_bed -> write_bed_cpp
Execution halted
Thanks to Bingsong Zhang for reporting the bug!
read_eigenvec
and write_eigenvec
have new option plink2
for better handling files with
headers in the default style of plink2.count_lines
and all read_*
functions, which use add_ext_read
internally to sort out
file paths:
ext = NA
finds files that end in a
.gz
extension that was not specified (before those files
were incorrectly not found).read_matrix( 'my-file', ext = NA )
now finds
and reads my-file.gz
if it exists and my-file
does not exist.README
fixed github installation instructions to build
vignette, explained how to view it.read_grm
added several options to facilitate
reading GRM-like formats produced by plink2
, particularly
data produced by --make-king
with bin
or
bin4
options. Added options:
ext
to specify alternate shared extensions (like “grm”
or “king”).shape
to specify whether the input is a full “square”
matrix, a “triangle” with diagonal (default for GRM) or a “strict”
triangle without diagonal (for KING-robust).size_bytes
to parse bin4
/GRM (4) or
bin
(8) plink2 data.comment
to control comment characters in the
<ext>.id
file.vec_to_mat_sym
and
mat_sym_to_vec
added option strict
to exclude
diagonal in their transformations.read_tab_generic
added option
comment
to set comment characters.write_grm
added the same options added
yesterday to read_grm
(see there) to write GRM-like formats
produced by plink2
, particularly data produced by
--make-king
with bin
or bin4
options.read_grm
edited documentation only,
particularly added parsing examples for various
plink2 --make-king
outputs.write_bed
now checks if output directory
exists prior to attempting to open the file for writing in the C++ part
of the code.
The original code crashed “ruthlessly” in RStudio if the path contains a directory that does not exist, triggering an error such as this one on a terminal:
*** buffer overflow detected ***: terminated
Aborted (core dumped)
The new code produces an ordinary (fatal) error message in R without the buffer overflow.
Bug reported by Richel Bilderbeek (thanks!)
read_bim
, write_bim
, and
geno_to_char
: reversed columns “ref” and “alt” in BIM table
read_bim
now returns a tibble with allele names “alt”
and “ref” in that order (columns still ordered as they appear in input
file)write_bim
writes tables with column “alt” before
“ref”geno_to_char
reverses the role of “alt” and “ref”
correspondingly so that the output remains the same as before these
changes (the original outputs were correct as validated against the
plink1 “ped” text genotypes).cran-comments.md
write_plink
added option
write_phen
to streamline writing simulation outputs more
(as phen files are often required).sprintf
usage (see
below).
Solution was to replace calls to sprintf
, all of
which then went to stop
, with direct calls to
stop
.
* checking compiled code ... WARNING
File ‘genio/libs/genio.so’:
Found ‘sprintf’, possibly from ‘sprintf’ (C)
Objects: ‘read_bed_cpp.o’, ‘write_bed_cpp.o’
Compiled code should not call entry points which might terminate R nor
write to stdout/stderr instead of to the console, nor use Fortran I/O
nor system RNGs nor [v]sprintf.
cran-comments.md