Internal changes requested by CRAN around C level format strings (#1896).
Fixed tests related to changes to dim<-() in
R-devel (#1889).
Fixed a performance issue with vec_c() and ALTREP
vectors (in particular, the new ALTREP list vectors in R-devel)
(#1884).
Fixed an issue with complex vector tests related to changes in R-devel (#1883).
Added a class to the vec_locate_matches() error that
is thrown when an overflow would otherwise occur (#1845).
Fixed an issue with vec_rank() and 0-column data
frames (#1863).
Fixed an issue where certain ALTREP row names were being
materialized when passed to new_data_frame(). We’ve fixed
this by removing a safeguard in new_data_frame() that
performed a compatibility check when both n and
row.names were provided. Because this is a low level
function designed for performance, it is up to the caller to ensure
these inputs are compatible (tidyverse/dplyr#6596).
Fixed an issue where vec_set_*() used with data
frames could accidentally return an object with the type of the proxy
rather than the type of the original inputs (#1837).
Fixed a rare vec_locate_matches() bug that could
occur when using a max/min filter
(tidyverse/dplyr#6835).
Fixed conditional S3 registration to avoid a CRAN check NOTE that appears in R >=4.3.0 (#1832).
Fixed tests to maintain compatibility with the next version of waldo (#1829).
c.sfc() changes in sf 1.0-10
(#1817).New vec_run_sizes() for computing the size of each
run within a vector. It is identical to the times column
from vec_unrep(), but is faster if you don’t need the run
key (#1210).
New sizes argument to vec_chop() which
allows you to partition a vector using an integer vector describing the
size of each expected slice. It is particularly useful in combination
with vec_run_sizes() and list_sizes() (#1210,
#1598).
New obj_is_vector(),
obj_check_vector(), and vec_check_size()
validation helpers. We believe these are a better approach to vector
validation than vec_assert() and vec_is(),
which have been marked as questioning because the semantics of their
ptype arguments are hard to define and can often be
replaced by vec_cast() or a type predicate function like
rlang::is_logical() (#1784).
vec_is_list() and vec_check_list() have
been renamed to obj_is_list() and
obj_check_list(), in line with the new
obj_is_vector() helper. The old functions have been
silently deprecated, but an official deprecation process will start in
the next vctrs release (#1803).
vec_locate_matches() gains a new
relationship argument that holistically handles multiple
matches between needles and haystack. In
particular, relationship = "many-to-one" replaces
multiple = "error" and multiple = "warning",
which have been removed from the documentation and silently
soft-deprecated. Official deprecation for those options will start in a
future release (#1791).
vec_locate_matches() has changed its default
needles_arg and haystack_arg values from
"" to "needles" and "haystack",
respectively. This generally generates more informative error messages
(#1792).
vec_chop() has gained empty ... between
x and the optional indices argument. For
backwards compatibility, supplying vec_chop(x, indices)
without naming indices still silently works, but will be
deprecated in a future release (#1813).
vec_slice() has gained an error_call
argument (#1785).
The numeric_version type from base R is now better
supported in equality, comparison, and order based operations
(tidyverse/dplyr#6680).
R >=3.5.0 is now explicitly required. This is in line with the tidyverse policy of supporting the 5 most recent versions of R.
New vec_expand_grid(), which is a lower level helper
that is similar to tidyr::expand_grid() (#1325).
New vec_set_intersect(),
vec_set_difference(), vec_set_union(), and
vec_set_symmetric_difference() which compute set operations
like intersect(), setdiff(), and
union(), but the vctrs variants don’t strip attributes and
work with data frames (#1755, #1765).
vec_identify_runs() is now faster when used with
data frames (#1684).
The maximum load factor of the internal dictionary was reduced
from 77% to 50%, which improves performance of functions like
vec_match(), vec_set_intersect(), and
vec_unique() in some cases (#1760).
Fixed a bug with the internal vec_order_radix()
function related to matrix columns (#1753).
vctrs is now compliant with -Wstrict-prototypes as
requested by CRAN (#1729).
vec_ptype2() now consistently falls back to bare
data frame in case of incompatible data frame subclasses. This is part
of a general move towards relaxed coercion rules.
Common type and cast errors now inherit from
"vctrs_error_ptype2" and "vctrs_error_cast"
respectively. They are still both subclasses from
"vctrs_error_incompatible_type" (which used to be their
most specific class and is now a parent class).
New list_all_size() and
list_check_all_size() to quickly determine if a list
contains elements of a particular size (#1582).
list_unchop() has gained empty ... to
force optional arguments to be named (#1715).
vec_rep_each(times = 0) now works correctly with
logical vectors that are considered unspecified and with named vectors
(#1673).
list_of() was relaxed to make it easier to combine.
It is now coercible with list() (#1161). When incompatible
list_of() types are combined, the result is now a bare
list().
Following this change, the role of list_of() is mainly
to carry type information for potential optimisations, rather than to
guarantee a certain type throughout an analysis.
validate_list_of() has been removed. It hasn’t
proven to be practically useful, and isn’t used by any packages on CRAN
(#1697).
Directed calls to vec_c(), like
vec_c(.ptype = <type>), now mention the position of
the problematic argument when there are cast errors (#1690).
list_unchop() no longer drops names in some cases
when indices were supplied (#1689).
"unique_quiet" and "universal_quiet"
are newly accepted by vec_as_names(repair =) and
vec_names2(repair =). These options exist to help users who
call these functions indirectly, via another function which only exposes
repair but not quiet. Specifying
repair = "unique_quiet" is like specifying
repair = "unique", quiet = TRUE. When the
"*_quiet" options are used, any setting of
quiet is silently overridden (@jennybc, #1629).
"unique_quiet" and "universal_quiet" are
also newly accepted for the name repair argument of several other
functions that do not expose a quiet argument:
data_frame(), df_list(), vec_c(),
list_unchop(), vec_interleave(),
vec_rbind(), and vec_cbind() (@jennybc, #1716).
list_unchop() has gained error_call and
error_arg arguments (#1641, #1692).
vec_c() has gained .error_call and
.error_arg arguments (#1641, #1692).
Improved the performance of list-of common type methods (#1686, #875).
The list-of method for as_list_of() now places the
optional .ptype argument after the ...
(#1686).
vec_rbind() now applies base::c()
fallback recursively within packed df-cols (#1331, #1462,
#1640).
vec_c(), vec_unchop(), and
vec_rbind() now proxy and restore recursively (#1107). This
prevents vec_restore() from being called with partially
filled vectors and improves performance (#1217, #1496).
New vec_any_missing() for quickly determining if a
vector has any missing values (#1672).
vec_equal_na() has been renamed to
vec_detect_missing() to align better with vctrs naming
conventions. vec_equal_na() will stick around for a few
minor versions, but has been formally soft-deprecated (#1672).
vec_c(outer = c(inner = 1)) now produces correct
error messages (#522).
If a data frame is returned as the proxy from
vec_proxy_equal(), vec_proxy_compare(), or
vec_proxy_order(), then the corresponding proxy function is
now automatically applied recursively along all of the columns.
Additionally, packed data frame columns will be unpacked, and 1 column
data frames will be unwrapped. This ensures that the simplest possible
types are provided to the native C algorithms, improving both
correctness and performance (#1664).
When used with record vectors, vec_proxy_compare()
and vec_proxy_order() now call the correct proxy function
while recursing over the fields (#1664).
The experimental function vec_list_cast() has been
removed from the package (#1382).
Native classes like dates and datetimes now accept dimensions (#1290, #1329).
vec_compare() now throws a more informative error
when attempting to compare complex vectors (#1655).
vec_rep() and friends gain error_call,
x_arg, and times_arg arguments so they can be
embedded in frontends (#1303).
Record vectors now fail as expected when indexed along dimensions greater than 1 (#1295).
vec_order() and vec_sort() now have
... between the required and optional arguments to make
them easier to extend (#1647).
S3 vignette was extended to show how to make the polynomial class atomic instead of a list (#1030).
The experimental n argument of
vec_restore() has been removed. It was only used to inform
on the size of data frames in case a bare list is restored. It is now
expected that bare lists be initialised to data frame so that the size
is carried through row attributes. This makes the generic simpler and
fixes some performance issues (#650).
The anyNA() method for vctrs_vctr (and
thus vctrs_list_of) now supports the recursive
argument (#1278).
vec_as_location() and num_as_location()
have gained a missing = "remove" option (#1595).
vec_as_location() no longer matches
NA_character_ and "" indices if those invalid
names appear in names (#1489).
vec_unchop() has been renamed to
list_unchop() to better indicate that it requires list
input. vec_unchop() will stick around for a few minor
versions, but has been formally soft-deprecated (#1209).
Lossy cast errors during scalar subscript validation now have the correct message (#1606).
Fixed confusing error message with logical [[
subscripts (#1608).
New vec_rank() to compute various types of sample
ranks (#1600).
num_as_location() now throws the right error when
there are out-of-bounds negative values and oob = "extend"
and negative = "ignore" are set (#1614, #1630).
num_as_location() now works correctly when a
combination of zero = "error" and
negative = "invert" are used (#1612).
data_frame() and df_list() have gained
.error_call arguments (#1610).
vec_locate_matches() has gained an
error_call argument (#1611).
"select" and "relocate" have been added
as valid subscript actions to support tidyselect and dplyr
(#1596).
num_as_location() has a new
oob = "remove" argument to remove out-of-bounds locations
(#1595).
vec_rbind() and vec_cbind() now have
.error_call arguments (#1597).
df_list() has gained a new .unpack
argument to optionally disable data frame unpacking (#1616).
vec_check_list(arg = "") now throws the correct
error (#1604).
The difftime to difftime
vec_cast() method now standardizes the internal storage
type to double, catching potentially corrupt integer storage
difftime vectors (#1602).
vec_as_location2() and
vec_as_subscript2() more correctly utilize their
call arguments (#1605).
vec_count(sort = "count") now uses a stable sorting
method. This ensures that different keys with the same count are sorted
in the order that they originally appeared in (#1588).
Lossy cast error conditions now show the correct message when
conditionMessage() is called on them (#1592).
Fixed inconsistent reporting of conflicting inputs in
vec_ptype_common() (#1570).
vec_ptype_abbr() and vec_ptype_full()
now suffix 1d arrays with [1d].
vec_ptype_abbr() and vec_ptype_full()
methods are no longer inherited (#1549).
vec_cast() now throws the correct error when
attempting to cast a subclassed data frame to a non-data frame type
(#1568).
vec_locate_matches() now uses a more conservative
heuristic when taking the joint ordering proxy. This allows it to work
correctly with sf’s sfc vectors and the classes from the bignum package
(#1558).
An sfc method for vec_proxy_order() was added to
better support the sf package. These vectors are generally treated like
list-columns even though they don’t explicitly have a
"list" class, and the vec_proxy_order() method
now forwards to the list method to reflect that (#1558).
vec_proxy_compare() now works correctly for raw
vectors wrapped in I(). vec_proxy_order() now
works correctly for raw and list vectors wrapped in I()
(#1557).
OOB errors with character() indexes use “that don’t
exist” instead of “past the end” (#1543).
Fixed memory protection issues related to common type determination (#1551, tidyverse/tidyr#1348).
New experimental vec_locate_sorted_groups() for
returning the locations of groups in sorted order. This is equivalent
to, but faster than, calling vec_group_loc() and then
sorting by the key column of the result.
New experimental vec_locate_matches() for locating
where each observation in one vector matches one or more observations in
another vector. It is similar to vec_match(), but returns
all matches by default (rather than just the first), and can match on
binary conditions other than equality. The algorithm is inspired by
data.table’s very fast binary merge procedure.
The vec_proxy_equal(),
vec_proxy_compare(), and vec_proxy_order()
methods for vctrs_rcrd are now applied recursively over the
fields (#1503).
Lossy cast errors now inherit from incompatible type errors.
vec_is_list() now returns TRUE for
AsIs lists (#1463).
vec_assert(), vec_ptype2(),
vec_cast(), and vec_as_location() now use
caller_arg() to infer a default arg value from
the caller.
This may result in unhelpful arguments being mentioned in error
messages. In general, you should consider snapshotting vctrs error
messages thrown in your package and supply arg and
call arguments if the error context is not adequately
reported to your users.
vec_ptype_common(), vec_cast_common(),
vec_size_common(), and vec_recycle_common()
gain call and arg arguments for specifying an
error context.
vec_compare() can now compare zero column data
frames (#1500).
new_data_frame() now errors on negative and missing
n values (#1477).
vec_order() now correctly orders zero column data
frames (#1499).
vctrs now depends on cli to help with error message generation.
New vec_check_list() and
list_check_all_vectors() input checkers, and an
accompanying list_all_vectors() predicate.
New vec_interleave() for combining multiple vectors
together, interleaving their elements in the process (#1396).
vec_equal_na(NULL) now returns
logical(0) rather than erroring (#1494).
vec_as_location(missing = "error") now fails with
NA and NA_character_ in addition to
NA_integer_ (#1420, @krlmlr).
Starting with rlang 1.0.0, errors are displayed with the
contextual function call. Several vctrs operations gain a
call argument that makes it possible to report the correct
context in error messages. This concerns:
vec_cast() and vec_ptype2()vec_default_cast() and
vec_default_ptype2()vec_assert()vec_as_names()stop_ constructors like
stop_incompatible_type()Note that default vec_cast() and
vec_ptype2() methods automatically support this if they
pass ... to the corresponding vec_default_
functions. If you throw a non-internal error from a non-default method,
add a call = caller_env() argument in the method and pass
it to rlang::abort().
If NA_character_ is specified as a name for
vctrs_vctr objects, it is now automatically repaired to
"" (#780).
"" is now an allowed name for
vctrs_vctr objects and all its subclasses
(vctrs_list_of in particular) (#780).
list_of() is now much faster when many values are
provided.
vec_as_location() evaluates arg only in
case of error, for performance (#1150, @krlmlr).
levels.vctrs_vctr() now returns NULL
instead of failing (#1186, @krlmlr).
vec_assert() produces a more informative error when
size is invalid (#1470).
vec_duplicate_detect() is a bit faster when there
are many unique values.
vec_proxy_order() is described in
vignette("s3-vectors") (#1373, @krlmlr).
vec_chop() now materializes ALTREP vectors before
chopping, which is more efficient than creating many small ALTREP pieces
(#1450).
New list_drop_empty() for removing empty elements
from a list (#1395).
list_sizes() now propagates the names of the list
onto the result.
Name repair messages are now signaled by
rlang::names_inform_repair(). This means that the messages
are now sent to stdout by default rather than to stderr, resulting in
prettier messages. Additionally, name repair messages can now be
silenced through the global option
rlib_name_repair_verbosity, which is useful for testing
purposes. See ?names_inform_repair for more information
(#1429).
vctrs_vctr methods for na.omit(),
na.exclude(), and na.fail() have been added
(#1413).
vec_init() is now slightly faster (#1423).
vec_set_names() no longer corrupts
vctrs_rcrd types (#1419).
vec_detect_complete() now computes completeness for
vctrs_rcrd types in the same way as data frames, which
means that if any field is missing, the entire record is considered
incomplete (#1386).
The na_value argument of vec_order()
and vec_sort() now correctly respect missing values in
lists (#1401).
vec_rep() and vec_rep_each() are much
faster for times = 0 and times = 1 (@mgirlich,
#1392).
vec_equal_na() and vec_fill_missing()
now work with integer64 vectors (#1304).
The xtfrm() method for vctrs_vctr objects no longer
accidentally breaks ties (#1354).
min(), max() and range()
no longer throw an error if na.rm = TRUE is set and all
values are NA (@gorcha, #1357). In this case, and where an
empty input is given, it will return Inf/-Inf,
or NA if Inf can’t be cast to the input
type.
vec_group_loc(), used for grouping in dplyr, now
correctly handles vectors with billions of elements (up to
.Machine$integer.max) (#1133).
vec_ptype_abbr() gains arguments to control whether
to indicate named vectors with a prefix (prefix_named) and
indicate shaped vectors with a suffix (suffix_shape) (#781,
@krlmlr).
vec_ptype() is now an optional performance
generic. It is not necessary to implement, but if your class has a
static prototype, you might consider implementing a custom
vec_ptype() method that returns a constant to improve
performance in some cases (such as common type imputation).
New vec_detect_complete(), inspired by
stats::complete.cases(). For most vectors, this is
identical to !vec_equal_na(). For data frames and matrices,
this detects rows that only contain non-missing values.
vec_order() can now order complex vectors
(#1330).
Removed dependency on digest in favor of
rlang::hash().
Fixed an issue where vctrs_rcrd objects were not
being proxied correctly when used as a data frame column
(#1318).
register_s3() is now licensed with the “unlicense”
which makes it very clear that it’s fine to copy and paste into your own
package (@maxheld83, #1254).
Fixed an issue with tibble 3.0.0 where removing column names with
names(x) <- NULL is now deprecated (#1298).
Fixed a GCC 11 issue revealed by CRAN checks.
New experimental vec_fill_missing() for filling in
missing values with the previous or following value. It is similar to
tidyr::fill(), but also works with data frames and has an
additional max_fill argument to limit the number of
sequential missing values to fill.
New vec_unrep() to compress a vector with repeated
values. It is very similar to run length encoding, and works nicely
alongside vec_rep_each() as a way to invert the
compression.
vec_cbind() with only empty data frames now
preserves the common size of the inputs in the result (#1281).
vec_c() now correctly returns a named result with
named empty inputs (#1263).
vctrs has been relicensed as MIT (#1259).
Functions that make comparisons within a single vector, such as
vec_unique(), or between two vectors, such as
vec_match(), now convert all character input to UTF-8
before making comparisons (#1246).
New vec_identify_runs() which returns a vector of
identifiers for the elements of x that indicate which run
of repeated values they fall in (#1081).
Fixed an encoding translation bug with lists containing data
frames which have columns where vec_size() is different
from the low level Rf_length() (#1233).
The table class is now implemented as a wrapper type
that delegates its coercion methods. It used to be restricted to integer
tables (#1190).
Named one-dimensional arrays now behave consistently with simple
vectors in vec_names() and
vec_rbind().
new_rcrd() now uses df_list() to
validate the fields. This makes it more flexible as the fields can now
be of any type supported by vctrs, including data frames.
Thanks to the previous change the [[ method of
records now preserves list fields (#1205).
vec_data() now preserves data frames. This is
consistent with the notion that data frames are a primitive vector type
in vctrs. This shouldn’t affect code that uses [[ and
length() to manipulate the data. On the other hand, the
vctrs primitives like vec_slice() will now operate rowwise
when vec_data() returns a data frame.
outer is now passed unrecycled to name
specifications. Instead, the return value is recycled (#1099).
Name specifications can now return NULL. The names
vector will only be allocated if the spec function returns
non-NULL during the concatenation. This makes it possible
to ignore outer names without having to create an empty names vector
when there are no inner names:
zap_outer_spec <- function(outer, inner) if (is_character(inner)) inner
# `NULL` names rather than a vector of ""
names(vec_c(a = 1:2, .name_spec = zap_outer_spec))
#> NULL
# Names are allocated when inner names exist
names(vec_c(a = 1:2, c(b = 3L), .name_spec = zap_outer_spec))
#> [1] ""  ""  "b"Fixed several performance issues in vec_c() and
vec_unchop() with named vectors.
The restriction that S3 lists must have a list-based proxy to be
considered lists by vec_is_list() has been removed
(#1208).
New performant data_frame() constructor for creating
data frames in a way that follows tidyverse semantics. Among other
things, inputs are recycled using tidyverse recycling rules, strings are
never converted to factors, list-columns are easier to create, and
unnamed data frame input is automatically spliced.
New df_list() for safely and consistently
constructing the data structure underlying a data frame, a named list of
equal-length vectors. It is useful in combination with
new_data_frame() for creating user-friendly constructors
for data frame subclasses that use the tidyverse rules for recycling and
determining types.
Fixed performance issue with vec_order() on classed
vectors which affected dplyr::group_by()
(tidyverse/dplyr#5423).
vec_set_names() no longer alters the input in-place
(#1194).
New vec_proxy_order() that provides an ordering
proxy for use in vec_order() and vec_sort().
The default method falls through to vec_proxy_compare().
Lists are special cased, and return an integer vector proxy that orders
by first appearance.
List columns in data frames are no longer comparable through
vec_compare().
The experimental relax argument has been removed
from vec_proxy_compare().
Fixed a performance issue in bind_rows() with S3
columns (#1122, #1124, #1151, tidyverse/dplyr#5327).
vec_slice() now checks sizes of data frame columns
in case the data structure is corrupt (#552).
The native routines in vctrs now dispatch and evaluate in the vctrs namespace. This improves the continuity of evaluation in backtraces.
new_data_frame() is now twice as fast when
class is supplied.
New vec_names2(), vec_names() and
vec_set_names() (#1173).
vec_slice() no longer restores attributes of foreign
objects for which a [ method exist. This fixes an issue
with ts objects which were previously incorrectly
restored.
The as.list() method for vctrs_rcrd
objects has been removed in favor of directly using the method for
vctrs_vctr, which calls vec_chop().
vec_c() and vec_rbind() now fall back
to base::c() if the inputs have a common class hierarchy
for which a c() method is implemented but no self-to-self
vec_ptype2() method is implemented.
vec_rbind() now internally calls
vec_proxy() and vec_restore() on the data
frame common type that is used to create the output (#1109).
vec_as_location2("0") now works correctly
(#1131).
?reference-faq-compatibility is a new reference
guide on vctrs primitives. It includes an overview of the fallbacks to
base R generics implemented in vctrs for compatibility with existing
classes.
The documentation of vctrs functions now includes a Dependencies section to reference which other vctrs operations are called from that function. By following the dependencies links recursively, you will find the vctrs primitives on which an operation relies.
c.factor() method.This version features an overhaul of the coercion system to make it more consistent and easier to implement. See the Breaking changes and Type system sections for details.
There are three new documentation topics if you’d like to learn how to implement coercion methods to make your class compatible with tidyverse packages like dplyr:
https://vctrs.r-lib.org/reference/theory-faq-coercion.html for an overview of the coercion mechanism in vctrs.
https://vctrs.r-lib.org/reference/howto-faq-coercion.html for a practical guide about implementing methods for vectors.
https://vctrs.r-lib.org/reference/howto-faq-coercion-data-frame.html for a practical guide about implementing methods for data frames.
The following errors are caused by breaking changes.
"Can't convert <character> to <list>."
vec_cast() no longer converts to list. Use
vec_chop() or as.list() instead.
"Can't convert <integer> to <character>."
vec_cast() no longer converts to character. Use
as.character()to deparse objects.
"names for target but not for current"
Names of list-columns are now preserved by vec_rbind().
Adjust tests accordingly.
Double-dispatch methods for vec_ptype2() and
vec_cast() are no longer inherited (#710). Class
implementers must implement one set of methods for each compatible
class.
For example, a tibble subclass no longer inherits from the
vec_ptype2() methods between tbl_df and
data.frame. This means that you explicitly need to
implement vec_ptype2() methods with tbl_df and
data.frame.
This change requires a bit more work from class maintainers but is
safer because the coercion hierarchies are generally different from
class hierarchies. See the S3 dispatch section of
?vec_ptype2 for more information.
vec_cast() is now restricted to the same conversions
as vec_ptype2() methods (#606, #741). This change is
motivated by safety and performance:
It is generally sloppy to generically convert arbitrary inputs to one type. Restricted coercions are more predictable and allow your code to fail earlier when there is a type issue.
When unrestricted conversions are useful, this is generally
towards a known type. For example, glue::glue() needs to
convert arbitrary inputs to the known character type. In this case,
using double dispatch instead of a single dispatch generic like
as.character() is wasteful.
To implement the useful semantics of coercible casts (already
used in vec_assign()), two double dispatch were needed. Now
it can be done with one double dispatch by calling
vec_cast() directly.
stop_incompatible_cast() now throws an error of
class vctrs_error_incompatible_type rather than
vctrs_error_incompatible_cast. This means that
vec_cast() also throws errors of this class, which better
aligns it with vec_ptype2() now that they are restricted to
the same conversions.
The y argument of
stop_incompatible_cast() has been renamed to
to to better match to_arg.
Double-dispatch methods for vec_ptype2() and
vec_cast() are now easier to implement. They no longer need
any the boiler plate. Implementing a method for classes foo
and bar is now as simple as:
#' @export
vec_ptype2.foo.bar <- function(x, y, ...) new_foo()vctrs also takes care of implementing the default and unspecified methods. If you have implemented these methods, they are no longer called and can now be removed.
One consequence of the new dispatch mechanism is that
NextMethod() is now completely unsupported. This is for the
best as it never worked correctly in a double-dispatch setting. Parent
methods must now be called manually.
vec_ptype2() methods now get zero-size prototypes as
inputs. This guarantees that methods do not peek at the data to
determine the richer type.
vec_is_list() no longer allows S3 lists that
implement a vec_proxy() method to automatically be
considered lists. A S3 list must explicitly inherit from
"list" in the base class to be considered a list.
vec_restore() no longer restores row names if the
target is not a data frame. This fixes an issue where
POSIXlt objects would carry a row.names
attribute after a proxy/restore roundtrip.
vec_cast() to and from data frames preserves the row
names of inputs.
The internal function vec_names() now returns row
names if the input is a data frame. Similarly,
vec_set_names() sets row names on data frames. This is part
of a general effort at making row names the vector names of data frames
in vctrs.
If necessary, the row names are repaired verbosely but without error to make them unique. This should be a mostly harmless change for users, but it could break unit tests in packages if they make assumptions about the row names.
With the double dispatch changes, the coercion methods are no longer inherited from parent classes. This is because the coercion hierarchy is in principle different from the S3 hierarchy. A consequence of this change is that subclasses that don’t implement coercion methods are now in principle incompatible.
This is particularly problematic with subclasses of data frames for
which throwing incompatible errors would be too incovenient for users.
To work around this, we have implemented a fallback to the relevant base
data frame class (either data.frame or tbl_df)
in coercion methods (#981). This fallback is silent unless you set the
vctrs:::warn_on_fallback option to TRUE.
In the future we may extend this fallback principle to other base
types when they are explicitly included in the class vector (such as
"list").
Improved support for foreign classes in the combining operations
vec_c(), vec_rbind(), and
vec_unchop(). A foreign class is a class that doesn’t
implement vec_ptype2(). When all the objects to combine
have the same foreign class, one of these fallbacks is invoked:
If the class implements a base::c() method, the
method is used for the combination. (FIXME: vec_rbind()
currently doesn’t use this fallback.)
Otherwise if the objects have identical attributes and the same base type, we consider them to be compatible. The vectors are concatenated and the attributes are restored (#776).
These fallbacks do not make your class completely compatible with vctrs-powered packages, but they should help in many simple cases.
vec_c() and vec_unchop() now fall back
to base::c() for S4 objects if the object doesn’t implement
vec_ptype2() but sets an S4 c() method
(#919).
vec_rbind() and vec_c() with data frame
inputs now consistently preserve the names of list-columns, df-columns,
and matrix-columns (#689). This can cause some false positives in unit
tests, if they are sensitive to internal names (#1007).
vec_rbind() now repairs row names silently to avoid
confusing messages when the row names are not informative and were not
created on purpose.
vec_rbind() gains option to treat input names as row
names. This is disabled by default (#966).
New vec_rep() and vec_rep_each() for
repeating an entire vector and elements of a vector, respectively. These
two functions provide a clearer interface for the functionality of
vec_repeat(), which is now deprecated.
vec_cbind() now calls vec_restore() on
inputs emptied of their columns before computing the common type. This
has consequences for data frame classes with special columns that
devolve into simpler classes when the columns are subsetted out. These
classes are now always simplified by vec_cbind().
For instance, column-binding a grouped data frame with a data frame now produces a tibble (the simplified class of a grouped data frame).
vec_match() and vec_in() gain
parameters for argument tags (#944).
The internal version of vec_assign() now has support
for assigning names and inner names. For data frames, the names are
assigned recursively.
vec_assign() gains x_arg and
value_arg parameters (#918).
vec_group_loc(), which powers
dplyr::group_by(), now has more efficient vector access
(#911).
vec_ptype() gained an x_arg
argument.
New list_sizes() for computing the size of every
element in a list. list_sizes() is to
vec_size() as lengths() is to
length(), except that it only supports lists. Atomic
vectors and data frames result in an error.
new_data_frame() infers size from row names when
n = NULL (#894).
vec_c() now accepts rlang::zap() as
.name_spec input. The returned vector is then always
unnamed, and the names do not cause errors when they can’t be combined.
They are still used to create more informative messages when the inputs
have incompatible types (#232).
vctrs now supports the data.table class. The common
type of a data frame and a data table is a data table.
new_vctr() now always appends a base
"list" class to list .data to be compatible
with changes to vec_is_list(). This affects
new_list_of(), which now returns an object with a base
class of "list".
dplyr methods are now implemented for vec_restore(),
vec_ptype2(), and vec_cast(). The user-visible
consequence (and breaking change) is that row-binding a grouped data
frame and a data frame or tibble now returns a grouped data frame. It
would previously return a tibble.
The is.na<-() method for vctrs_vctr
now supports numeric and character subscripts to indicate where to
insert missing values (#947).
Improved support for vector-like S4 objects (#550, #551).
The base classes AsIs and table have
vctrs methods (#904, #906).
POSIXlt and POSIXct vectors are handled
more consistently (#901).
Ordered factors that do not have identical levels are now incompatible. They are now incompatible with all factors.
vec_as_subscript() now fails when the subscript is a
matrix or an array, consistently with
vec_as_location().
Improved error messages in vec_as_location() when
subscript is a matrix or array (#936).
vec_as_location2() properly picks up
subscript_arg (tidyverse/tibble#735).
vec_as_names() now has more informative error
messages when names are not unique (#882).
vec_as_names() gains a repair_arg
argument that when set will cause repair = "check_unique"
to generate an informative hint (#692).
stop_incompatible_type() now has an
action argument for customizing whether the coercion error
came from vec_ptype2() or vec_cast().
stop_incompatible_cast() is now a thin wrapper around
stop_incompatible_type(action = "convert").
stop_ functions now take details after
the dots. This argument can no longer be passed by position.
Supplying both details and message to
the stop_ functions is now an internal error.
x_arg, y_arg, and to_arg
are now compulsory arguments in stop_ functions like
stop_incompatible_type().
Lossy cast errors are now considered internal. Please don’t test for the class or explicitly handle them.
New argument loss_type for the experimental function
maybe_lossy_cast(). It can take the values “precision” or
“generality” to indicate in the error message which kind of loss is the
error about (double to integer loses precision, character to factor
loses generality).
Coercion and recycling errors are now more consistent.
Fixed clang-UBSAN error “nan is outside the range of representable values of type ‘int’” (#902).
Fixed compilation of stability vignette following the date conversion changes on R-devel.
Factors and dates methods are now implemented in C for efficiency.
new_data_frame() now correctly updates attributes
and supports merging of the "names" and
"row.names" arguments (#883).
vec_match() gains an na_equal argument
(#718).
vec_chop()’s indices argument has been
restricted to positive integer vectors. Character and logical subscripts
haven’t proven useful, and this aligns vec_chop() with
vec_unchop(), for which only positive integer vectors make
sense.
New vec_unchop() for combining a list of vectors
into a single vector. It is similar to vec_c(), but gives
greater control over how the elements are placed in the output through
the use of a secondary indices argument.
Breaking change: When .id is supplied,
vec_rbind() now creates the identifier column at the start
of the data frame rather than at the end.
numeric_version and package_version
lists are now treated as vectors (#723).
vec_slice() now properly handles symbols and S3
subscripts.
vec_as_location() and
vec_as_subscript() are now fully implemented in C for
efficiency.
num_as_location() gains a new argument,
zero, for controlling whether to "remove",
"ignore", or "error" on zero values
(#852).
The main feature of this release is considerable performance improvements with factors and dates.
vec_c() now falls back to base::c() if
the vector doesn’t implement vec_ptype2() but implements
c(). This should improve the compatibility of vctrs-based
functions with foreign classes (#801).
new_data_frame() is now faster.
New vec_is_list() for detecting if a vector is a
list in the vctrs sense. For instance, objects of class lm
are not lists. In general, classes need to explicitly inherit from
"list" to be considered as lists by vctrs.
Unspecified vectors of NA can now be assigned into a
list (#819).
x <- list(1, 2)
vec_slice(x, 1) <- NA
x
#> [[1]]
#> NULL
#>
#> [[2]]
#> 2vec_ptype() now errors on scalar inputs
(#807).
vec_ptype_finalise() is now recursive over all data
frame types, ensuring that unspecified columns are correctly finalised
to logical (#800).
vec_ptype() now correctly handles unspecified
columns in data frames, and will always return an unspecified column
type (#800).
vec_slice() and vec_chop() now work
correctly with bit64::integer64() objects when an
NA subscript is supplied. By extension, this means that
vec_init() now works with these objects as well
(#813).
vec_rbind() now binds row names. When named inputs
are supplied and names_to is NULL, the names
define row names. If names_to is supplied, they are
assigned in the column name as before.
vec_cbind() now uses the row names of the first
named input.
The c() method for vctrs_vctr now
throws an error when recursive or use.names is
supplied (#791).
New vec_as_subscript() function to cast inputs to
the base type of a subscript (logical, numeric, or character).
vec_as_index() has been renamed to
vec_as_location(). Use num_as_location() if
you need more options to control how numeric subscripts are converted to
a vector of locations.
New vec_as_subscript2(),
vec_as_location2(), and num_as_location2()
variants for validating scalar subscripts and locations (e.g. for
indexing with [[).
vec_as_location() now preserves names of its inputs
if possible.
vec_ptype2() methods for base classes now prevent
inheritance. This makes sense because the subtyping graph created by
vec_ptype2() methods is generally not the same as the
inheritance relationships defined by S3 classes. For instance,
subclasses are often a richer type than their superclasses, and should
often be declared as supertypes (e.g. vec_ptype2() should
return the subclass).
We introduced this breaking change in a patch release because
new_vctr() now adds the base type to the class vector by
default, which caused vec_ptype2() to dispatch erroneously
to the methods for base types. We’ll finish switching to this approach
in vctrs 0.3.0 for the rest of the base S3 classes (dates, data frames,
…).
vec_equal_na() now works with complex
vectors.
vctrs_vctr class gains an as.POSIXlt()
method (#717).
vec_is() now ignores names and row names
(#707).
vec_slice() now support Altvec vectors (@jimhester,
#696).
vec_proxy_equal() is now applied recursively across
the columns of data frames (#641).
vec_split() no longer returns the val
column as a list_of. It is now returned as a bare list
(#660).
Complex numbers are now coercible with integer and double (#564).
zeallot has been moved from Imports to Suggests, meaning that
%<-% is no longer re-exported from vctrs.
vec_equal() no longer propagates missing values when
comparing list elements. This means that
vec_equal(list(NULL), list(NULL)) will continue to return
NA because NULL is the missing element for a
list, but now vec_equal(list(NA), list(NA)) returns
TRUE because the NA values are compared
directly without checking for missingness.
Lists of expressions are now supported in
vec_equal() and functions that compare elements, such as
vec_unique() and vec_match(). This ensures
that they work with the result of modeling functions like
glm() and mgcv::gam() which store “family”
objects containing expressions (#643).
new_vctr() gains an experimental
inherit_base_type argument which determines whether or not
the class of the underlying type will be included in the class.
list_of() now inherits explicitly from “list”
(#593).
vec_ptype() has relaxed default behaviour for base
types; now if two vectors both inherit from (e.g.) “character”, the
common type is also “character” (#497).
vec_equal() now correctly treats NULL
as the missing value element for lists (#653).
vec_cast() now casts data frames to lists rowwise,
i.e. to a list of data frames of size 1. This preserves the invariant of
vec_size(vec_cast(x, to)) == vec_size(x) (#639).
Positive and negative 0 are now considered equivalent by all functions that check for equality or uniqueness (#637).
New experimental functions vec_group_rle() for
returning run length encoded groups; vec_group_id() for
constructing group identifiers from a vector;
vec_group_loc() for computing the locations of unique
groups in a vector (#514).
New vec_chop() for repeatedly slicing a vector. It
efficiently captures the pattern of
map(indices, vec_slice, x = x).
Support for multiple character encodings has been added to
functions that compare elements within a single vector, such as
vec_unique(), and across multiple vectors, such as
vec_match(). When multiple encodings are encountered, a
translation to UTF-8 is performed before any comparisons are made (#600,
#553).
Equality and ordering methods are now implemented for raw and complex vectors (@romainfrancois).
Maintenance release for CRAN checks.
With the 0.2.0 release, many vctrs functions have been rewritten with
native C code to improve performance. Functions like
vec_c() and vec_rbind() should now be fast
enough to be used in packages. This is an ongoing effort, for instance
the handling of factors and dates has not been rewritten yet. These
classes still slow down vctrs primitives.
The API in 0.2.0 has been updated, please see a list of breaking
changes below. vctrs has now graduated from experimental to a maturing
package. Please note that API changes are still planned for future
releases, for instance vec_ptype2() and
vec_cast() might need to return a sentinel instead of
failing with an error when there is no common type or possible cast.
Lossy casts now throw errors of type
vctrs_error_cast_lossy. Previously these were warnings. You
can suppress these errors selectively with
allow_lossy_cast() to get the partial cast results. To
implement your own lossy cast operation, call the new exported function
maybe_lossy_cast().
vec_c() now fails when an input is supplied with a
name but has internal names or is length > 1:
vec_c(foo = c(a = 1))
#> Error: Can't merge the outer name `foo` with a named vector.
#> Please supply a `.name_spec` specification.
vec_c(foo = 1:3)
#> Error: Can't merge the outer name `foo` with a vector of length > 1.
#> Please supply a `.name_spec` specification.You can supply a name specification that describes how to combine the external name of the input with its internal names or positions:
# Name spec as glue string:
vec_c(foo = c(a = 1), .name_spec = "{outer}_{inner}")
# Name spec as a function:
vec_c(foo = c(a = 1), .name_spec = function(outer, inner) paste(outer, inner, sep = "_"))
vec_c(foo = c(a = 1), .name_spec = ~ paste(.x, .y, sep = "_"))vec_empty() has been renamed to
vec_is_empty().
vec_dim() and vec_dims() are no longer
exported.
vec_na() has been renamed to
vec_init(), as the primary use case is to initialize an
output container.
vec_slice<- is now type stable (#140). It always
returns the same type as the LHS. If needed, the RHS is cast to the
correct type, but only if both inputs are coercible. See examples in
?vec_slice.
We have renamed the type particle to
ptype:
vec_type() => vec_ptype()vec_type2() => vec_ptype2()vec_type_common() =>
vec_ptype_common()Consequently, vec_ptype() was renamed to
vec_ptype_show().
New vec_proxy() generic. This is the main
customisation point in vctrs along with vec_restore(). You
should only implement it when your type is designed around a non-vector
class (atomic vectors, bare lists, data frames). In this case,
vec_proxy() should return such a vector class. The vctrs
operations will be applied on the proxy and vec_restore()
is called to restore the original representation of your type.
The most common case where you need to implement
vec_proxy() is for S3 lists. In vctrs, S3 lists are treated
as scalars by default. This way we don’t treat objects like model fits
as vectors. To prevent vctrs from treating your S3 list as a scalar,
unclass it from the vec_proxy() method. For instance here
is the definition for list_of:
#' @export
vec_proxy.vctrs_list_of <- function(x) {
  unclass(x)
}If you inherit from vctrs_vctr or
vctrs_rcrd you don’t need to implement
vec_proxy().
vec_c(), vec_rbind(), and
vec_cbind() gain a .name_repair argument
(#227, #229).
vec_c(), vec_rbind(),
vec_cbind(), and all functions relying on
vec_ptype_common() now have more informative error messages
when some of the inputs have nested data frames that are not
convergent:
df1 <- tibble(foo = tibble(bar = tibble(x = 1:3, y = letters[1:3])))
df2 <- tibble(foo = tibble(bar = tibble(x = 1:3, y = 4:6)))
vec_rbind(df1, df2)
#> Error: No common type for `..1$foo$bar$y` <character> and `..2$foo$bar$y` <integer>.vec_cbind() now turns named data frames to packed
columns.
data <- tibble::tibble(x = 1:3, y = letters[1:3])
data <- vec_cbind(data, packed = data)
data
# A tibble: 3 x 3
      x y     packed$x $y
  <int> <chr>    <int> <chr>
1     1 a            1 a
2     2 b            2 b
3     3 c            3 cPacked data frames are nested in a single column. This makes it possible to access it through a single name:
data$packed
# A tibble: 3 x 2
      x y
  <int> <chr>
1     1 a
2     2 b
3     3 cWe are planning to use this syntax more widely in the tidyverse.
New vec_is() function to check whether a vector
conforms to a prototype and/or a size. Unlike vec_assert(),
it doesn’t throw errors but returns TRUE or
FALSE (#79).
Called without a specific type or size, vec_assert()
tests whether an object is a data vector or a scalar. S3 lists are
treated as scalars by default. Implement a vec_is_vector()
for your class to override this property (or derive from
vctrs_vctr).
New vec_order() and vec_sort() for
ordering and sorting generalised vectors.
New .names_to parameter for
vec_rbind(). If supplied, this should be the name of a
column where the names of the inputs are copied. This is similar to the
.id parameter of dplyr::bind_rows().
New vec_seq_along() and
vec_init_along() create useful sequences (#189).
vec_slice() now preserves character row names, if
present.
New vec_split(x, by) is a generalisation of
split() that can divide a vector into groups formed by the
unique values of another vector. Returns a two-column data frame
containing unique values of by aligned with matching
x values (#196).
Using classed errors of class "vctrs_error_assert"
for failed assertions, and of class
"vctrs_error_incompatible" (with subclasses
_type, _cast and _op) for errors
on incompatible types (#184).
Character indexing is now only supported for named objects, an error is raised for unnamed objects (#171).
Predicate generics now consistently return logical vectors when
passed a vctrs_vctr class. They used to restore the output
to their input type (#251).
list_of() now has an as.character()
method. It uses vec_ptype_abbr() to collapse complex
objects into their type representation (tidyverse/tidyr#654).
New stop_incompatible_size() to signal a failure due
to mismatched sizes.
New validate_list_of() (#193).
vec_arith() is consistent with base R when combining
difftime and date, with a warning if casts are
lossy (#192).
vec_c() and vec_rbind() now handle
data.frame columns properly (@yutannihilation, #182).
vec_cast(x, data.frame()) preserves the number of
rows in x.
vec_equal() now handles missing values symmetrically
(#204).
vec_equal_na() now returns TRUE for
data frames and records when every component is missing, not when
any component is missing (#201).
vec_init() checks input is a vector.
vec_proxy_compare() gains an experimental
relax argument, which allows data frames to be orderable
even if all their columns are not (#210).
vec_size() now works with positive short row names.
This fixes issues with data frames created with jsonlite
(#220).
vec_slice<- now has a vec_assign()
alias. Use vec_assign() when you don’t want to modify the
original input.
vec_slice() now calls vec_restore()
automatically. Unlike the default [ method from base R,
attributes are preserved by default.
vec_slice() can correct slice 0-row data frames
(#179).
New vec_repeat() for repeating each element of a
vector the same number of times.
vec_type2(x, data.frame()) ensures that the returned
object has names that are a length-0 character vector.