expect_base()
now errors when var
doesn't exist in the test data instead of passing (#71, @lachlangh).get_testdata()
now removes groups from the test data frame if it is a grouped_df
(#68). Grouped data frames cause issues with several expectations.
Fix for upcoming R-devel change.
expect_all()
to include more specific information on the location of erring values. In particular, variables failing the test are now highlighted in the error message. If, for instance, only one of three variables passed in as the vars
argument fails the test, then only that variable will be mentioned as failing the test in the failure message. This makes it easier to track down the cause of expectation failures when checking multiple variables at once. For example,expect_all(everything(), is.numeric, data = iris):
#> Error: `iris` has 150 records failing `is.numeric` on variable `Species`.
#> Vars: everything()
#> Filter: None
#> Arguments: ``
expect_all()
sits behind many other expectations, which will also see a change to their failure message.
Soft deprecated expect_allany()
. This function was implemented purely as a back-end for expect_all()
and expect_any()
. The change to expect_all()
mentioned above differentiates it from expect_any()
so that the two functions can no longer share a common back-end. Hence expect_allany()
is now redundant.
Added expect_depends()
which allows you to test for functional dependency among variables. See the wikipedia page for more information about functional dependencies.
Fixed a bug that was causing a corrupted excel file to be produced by output_results_excel()
if the test suite encountered an error (#56).
%E>%
to add expectations to a pipe chain.mtcars %E>%
expect_base(mpg, TRUE) %>%
mutate(mpg = NA) %E>%
expect_base(mpg, FALSE)
This is a shorthand way of piping into the with_testdata()
function.
mtcars %>%
with_testdata(expect_base(mpg, TRUE)) %>%
mutate(mpg = NA) %>%
with_testdata(expect_base(mpg, FALSE))
set_testdata()
previously always returned a data frame, and evaluated the test data if it was stored as a quosure. It now returns the data as it was stored, to get around a bug when piping data into the with_testdata()
function (#60).As of testdat 0.3.0 we have moved to the tidyselect framework for variable selections instead of dplyr::vars()
. tidyselect is the successor to vars()
- it's a bit cleaner, provides some nifty features like selecting columns with a predicate function using where()
and finally allows us to get rid of the distinction between multi and single variable expectations.
Unfortunately this will break some code, but it's horrendously difficult to support both methods and it's best to switch before publishing to CRAN. Fortunately it's a simple fix - anywhere that you are currently using vars()
, replace it with c()
:
# Old
expect_unique(vars(x, y))
# New
expect_unique(c(x, y))
The affected expectations are:
The uniqueness expectations: expect_unique()
, expect_unique_across()
, expect_unique_combine()
The exclusivity expectation: expect_exclusive()
The generic expectation helpers: expect_all()
, expect_any()
, expect_allany()
The checking helper functions: chk_filter()
, chk_filter_vars()
, chk_filter_all()
, chk_filter_any()
A small number of functions have been hard deprecated as they are now redundant:
expect_where()
and chk_filter_where()
have been hard deprecated. They are now equivalent to the corresponding *_all()
function.
chk_filter_vars()
has been renamed to chk_filter()
and the existing chk_filter()
has been removed to simplify the set of generic checking functions.
Auto-generated expectations have also changed slightly - previously they could either accept a single unquoted variable name or a group of variables specified with vars()
. They now always accept multiple columns using tidyselect syntax. As a result, the name of the first argument for these expect has changed from var
to vars
, so be careful if you're using this as a named argument.
tidyselect syntax allows single unquoted variable names as well as arbitrary groups of variable specifications, and all of the auto-generated expectations in the package used the single variable variant so this shouldn't break existing code.
These are all valid column specifications using tidyselect:
expect_values(a, 1:10)
expect_values(c(a, b), 1:10)
expect_values(a:c, 1:10)
expect_values(matches("^[ab]$"), 1:10)
expect_values(c(matches("^[ab]$"), c), 1:10)
expect_values(where(is.numeric), 1:10)
expect_exclusive()
has much improved documentation, and has had the argument name exc_vars
updated to var_set
to better reflect its purpose.
chk_pattern()
has been renamed to chk_regex()
to better reflect its purpose.
Soft deprecated context_data()
(#43). context_data()
is just a wrapper for set_testdata()
, which has a much more intuitive name.
Soft deprecated expect_similar()
(#18). It was a silly way of comparing data frames and we're better off making something new.
chk_blank()
performs checks slightly differently and is much faster as a result (#46).
chk_ascii()
was returning FALSE
if it detected non-ASCII printable characters anywhere in the input vector. It now checks each element of the vector individually.
In addition to minor updates and bug fixes, this release does three main things:
expect_*()
framework, e.g. by introducing
'fuzzy' expectations in the form of expect_prop_*()
.chk_filter()
and chk_filter_*()
functions to remove the .
prefix to bring them into line with the expectation functions (#9).Retired filter_expect()
(#25).
Hard deprecated chk_length()
, chk_miss()
and chk_nmiss()
(#5).
Soft deprecated expect_func()
(#24).
Soft deprecated expect_join()
in favour of expect_subset()
(#21).
Added chk_filter_where()
and expect_where()
which can be used to perform scoped expectations (#8).
Added expect_unique_combine()
for checking uniqueness across a combination of variables (#22).
Added expect_prop_*()
functions for 'fuzzy' expectations. An example is expect_prop_nmiss()
which can be used to validate a dataset allowing for a certain amount of missingness (#12).
Added expect_labels()
for testing variable and value labels (#20).
Added exclude
argument to expect_unique()
and expect_unique_*()
to allow users to exclude specific values from the uniqueness check. This is particularly useful when the dataset contains missing codes (#26).
Added quosure
argument to use_testdata()
(#1). This allows the dataset to be specified as a quosure, so all tests will be run against the current version.
Fixed bug in chk_filter()
which prevented the user from using vars = vars(everything())
.
Fixed deprecation warning that cropped up when using expect_similar
(#19).
Fixed bug in expect_similar()
which prevented variables with different names from being compared (#17).
Fixed deprecation warning that cropped up when using chk_text_nmiss
(#16).
Documentation for existing functions has been expanded considerably.
Added ...
argument to chk_range()
to bring it into line with expect_range()
(#28).
Refactored expect_values()
to implement it using expect_make()
(#27).
Added a new testdat.scipen
(default: 999
) to avoid issues from checks converting numeric variables to scientific notation (#3).
Added >= 0.8.0 dplyr version dependency (#2).
All calls to quo_label()
now use as_label()
instead as recommended by rlang (#10). This may cause minor changes to printing of test results.
Initial release.
Removed start_data_test()
and end_data_test()
as they don't have a clear use case.
Added a new with_testdata()
function for simpler interactive testing.
Moved use_testdat()
into srcproj, it's a more natural home.
Removed ExcelReporter
, it is unnecessary since we have Excel output from a ListReporter.
New expectation function factory expect_make()
allows users to automagically convert a logical check function of the form used by chk_*()
into an expectation. See ?expect_make
for details.
chk_values()
now takes the vector of missing values as an argument instead of a logical.
Renamed some chk_*()
functions for clarity. For the moment the old names are soft-deprecated and will give a warning, these will be hard-deprecated in the next minor release.
chk_length()
=> chk_max_length()
chk_miss()
=> chk_text_miss()
chk_nmiss()
=> chk_text_nmiss()
Added new expectations and options:
expect_range()
now takes extra allowable values in ...
. Specifically for cases where there is a bounded range with specified values outside (e.g. missing value codes).
expect_base()
has a new flag missing_valid
. If set to TRUE
, missing values are considered valid responses for records that meet the base condition. This allows for one-way base checks, i.e. records not meeting the base condition should be missing, but for records meeting the base condition we don't care.
Extensive cleanup of rlang usage to follow better practices. Expression capture for expectation messages should be much cleaner.
Simplified dependencies.
Documentation cleanup.
Remove R CMD Check warnings.
Improved test coverage