Package 'testdat' reference manual

Title:	Data Unit Testing for R
Description:	Test your data! An extension of the 'testthat' unit testing framework with a family of functions and reporting tools for checking and validating data frames.
Authors:	Danny Smith [aut, cre], Kinto Behr [aut], The Social Research Centre [cph]
Maintainer:	Danny Smith <[email protected]>
License:	MIT + file LICENSE
Version:	0.4.2.9000
Built:	2025-02-01 03:26:56 UTC
Source:	https://github.com/socialresearchcentre/testdat

Checks: dates

Description

Check that a vector conforms to a given date format such as YYYYMMDD.

Usage

chk_date_yyyymmdd(x)

chk_date_yyyymm(x)

chk_date_yyyy(x)
chk_date_yyyymmdd(x)

chk_date_yyyymm(x)

chk_date_yyyy(x)

Arguments

`x`	A vector to check.

Value

A logical vector flagging records that have passed or failed the check.

Examples


date <- c(20210101, 20211301, 20210132, 202101, 2021)
chk_date_yyyymmdd(date)

date <- c(202101, 202112, 202113, 2021)
chk_date_yyyymm(date)

date <- c("0001", "1688", "1775", "1789", "1791", "1848")
chk_date_yyyy(date)

date <- c(20210101, 20211301, 20210132, 202101, 2021)
chk_date_yyyymmdd(date)

date <- c(202101, 202112, 202113, 2021)
chk_date_yyyymm(date)

date <- c("0001", "1688", "1775", "1789", "1791", "1848")
chk_date_yyyy(date)

Checks: dummy

Description

These functions provide common, simple data checks.

Usage

chk_dummy(x)
chk_dummy(x)

Arguments

`x`	A vector to check.

Value

A logical vector flagging records that have passed or failed the check.

Examples


chk_dummy(LETTERS)

chk_dummy(LETTERS)

Checks: data frame helpers

Description

These helper functions allowing easy checking using an arbitrary function (func) over multiple columns (vars) of a data frame (data), with an optional filter (flt).

Usage

chk_filter(data, vars, func, flt = TRUE, args = list())

chk_filter_all(data, vars, func, flt = TRUE, args = list())

chk_filter_any(data, vars, func, flt = TRUE, args = list())
chk_filter(data, vars, func, flt = TRUE, args = list())

chk_filter_all(data, vars, func, flt = TRUE, args = list())

chk_filter_any(data, vars, func, flt = TRUE, args = list())

Arguments

`data`	A data frame to check.
`vars`	<`tidy-select`> A set of columns to check.
`func`	A function to use for checking that takes a vector as the first argument and returns a logical vector of the same length showing whether an element passed or failed.
`flt`	<`data-masking`> A filter specifying a subset of the data frame to test.
`args`	A list of additional arguments to be added to the function calls.

Details

chk_filter() applies func with args to vars in data filtered with flt and returns a data frame containing the resulting logical vectors.
chk_filter_all() and chk_filter_any() both run chk_filter() and return a single logical vector flagging whether all or any values in each row are TRUE (i.e. the conjunction and disjunction, respectively, of the columns in the output of chk_filter()).

Value

A logical vector or data frame of logical vectors flagging records that have passed or failed the check, with NA where records do not meet the filter condition.

Examples


# Check that every 4-cylinder car has an engine displacement of < 100 cubic
# inches AND < 100 horsepower - return a data frame
chk_filter(
  mtcars,
  c("disp", "hp"),
  chk_range,
  cyl == 4,
  list(min = 0, max = 100)
)

# Check that every 4-cylinder car has an engine displacement of < 100 cubic
# inches AND < 100 horsepower
chk_filter_all(
  mtcars,
  c("disp", "hp"),
  chk_range,
  cyl == 4,
  list(min = 0, max = 100)
)

# Check that every 4-cylinder car has an engine displacement of < 100 cubic
# inches OR < 100 horsepower
chk_filter_any(
  mtcars,
  c("disp", "hp"),
  chk_range,
  cyl == 4,
  list(min = 0, max = 100)
)

# Check that columns made up of whole numbers are binary
chk_filter_all(
  mtcars,
  where(~ all(. %% 1 == 0)),
  chk_values,
  TRUE,
  list(0:1)
)

# Check that every 4-cylinder car has an engine displacement of < 100 cubic
# inches AND < 100 horsepower - return a data frame
chk_filter(
  mtcars,
  c("disp", "hp"),
  chk_range,
  cyl == 4,
  list(min = 0, max = 100)
)

# Check that every 4-cylinder car has an engine displacement of < 100 cubic
# inches AND < 100 horsepower
chk_filter_all(
  mtcars,
  c("disp", "hp"),
  chk_range,
  cyl == 4,
  list(min = 0, max = 100)
)

# Check that every 4-cylinder car has an engine displacement of < 100 cubic
# inches OR < 100 horsepower
chk_filter_any(
  mtcars,
  c("disp", "hp"),
  chk_range,
  cyl == 4,
  list(min = 0, max = 100)
)

# Check that columns made up of whole numbers are binary
chk_filter_all(
  mtcars,
  where(~ all(. %% 1 == 0)),
  chk_values,
  TRUE,
  list(0:1)
)

Checks: labels

Description

Check that a vector is labelled in a given way.

Usage

chk_labels(x, val_labels = NULL, var_label = NULL)
chk_labels(x, val_labels = NULL, var_label = NULL)

Arguments

x

A vector to check.

val_labels

What value label check should be performed? One of:

A character vector of expected value labels.
A named vector of expected label-value pairs.
TRUE to test for the presence of value labels in general.
FALSE to test for the absence of value labels.
NULL to ignore value labels when checking.

var_label

What variable label check should be performed? One of:

A character vector of expected variable labels.
TRUE to test for the presence of a variable labels.
FALSE to test for the absence of a variable labels.
NULL to ignore the variable label when checking.

Value

A logical vector flagging records that have passed or failed the check.

Examples


df <- data.frame(
  x = labelled::labelled(c("M", "M", "F"), c(Male = "M", Female = "F"), "Sex"),
  y = labelled::labelled(c("M", "M", "F"), c(Male = "M", Female = "F", Other = "X")),
  z = c("M", "M", "F")
)

# Check for a value-label pairing
chk_labels(df$x, c(Male = "M"))

# Check that two variables have the same values
chk_labels(df$x, labelled::val_labels(df$y))

# Check for the presence of a particular label
chk_labels(df$x, "Male")
chk_labels(df$x, var_label = "Sex")

# Check that a variable is labelled at all
chk_labels(df$z, val_labels = TRUE)
chk_labels(df$z, var_label = TRUE)

# Check that a variable isn't labelled
chk_labels(df$z, val_labels = FALSE)
chk_labels(df$z, var_label = FALSE)

df <- data.frame(
  x = labelled::labelled(c("M", "M", "F"), c(Male = "M", Female = "F"), "Sex"),
  y = labelled::labelled(c("M", "M", "F"), c(Male = "M", Female = "F", Other = "X")),
  z = c("M", "M", "F")
)

# Check for a value-label pairing
chk_labels(df$x, c(Male = "M"))

# Check that two variables have the same values
chk_labels(df$x, labelled::val_labels(df$y))

# Check for the presence of a particular label
chk_labels(df$x, "Male")
chk_labels(df$x, var_label = "Sex")

# Check that a variable is labelled at all
chk_labels(df$z, val_labels = TRUE)
chk_labels(df$z, var_label = TRUE)

# Check that a variable isn't labelled
chk_labels(df$z, val_labels = FALSE)
chk_labels(df$z, var_label = FALSE)

Checks: patterns

Description

Check that a vector conforms to a certain pattern.

Usage

chk_regex(x, pattern)

chk_max_length(x, len)
chk_regex(x, pattern)

chk_max_length(x, len)

Arguments

`x`	A vector to check.
`pattern`	A str_detect() pattern to match.
`len`	Maximum string length.

Value

A logical vector flagging records that have passed or failed the check.

Examples


x <- c("a_1", "b_2", "c_2", NA, "NULL")
chk_regex(x, "[a-z]_[0-9]")
chk_max_length(x, 3)

x <- c("a_1", "b_2", "c_2", NA, "NULL")
chk_regex(x, "[a-z]_[0-9]")
chk_max_length(x, 3)

Checks: text

Description

Check character vectors for non-ASCII characters or common NULL value placeholders.

Usage

chk_ascii(x)

chk_text_miss(x, miss = getOption("testdat.miss_text"))

chk_text_nmiss(x, miss = getOption("testdat.miss_text"))
chk_ascii(x)

chk_text_miss(x, miss = getOption("testdat.miss_text"))

chk_text_nmiss(x, miss = getOption("testdat.miss_text"))

Arguments

`x`	A vector to check.
`miss`	A vector of values to be treated as missing. The testdat.miss or testdat.miss_text option is used by default.

Value

A logical vector flagging records that have passed or failed the check.

Examples


chk_ascii(c("a", "\U1f642")) # detect non-ASCII characters

imported_data <- c(1, "#n/a", 2, "", 3, NA)
chk_text_miss(imported_data)
chk_text_nmiss(imported_data) # Equivalent to !chk_text_miss(imported_data)

chk_ascii(c("a", "\U1f642")) # detect non-ASCII characters

imported_data <- c(1, "#n/a", 2, "", 3, NA)
chk_text_miss(imported_data)
chk_text_nmiss(imported_data) # Equivalent to !chk_text_miss(imported_data)

Checks: uniqueness

Description

Check that each value in a vector is unique.

Usage

chk_unique(x)
chk_unique(x)

Arguments

`x`	A vector to check.

Value

A logical vector flagging records that have passed or failed the check.

Examples


x <- c(NA, 1:10, NA)
chk_unique(x)

x <- c(10, 1:10, 10)
chk_unique(x)

x <- c(NA, 1:10, NA)
chk_unique(x)

x <- c(10, 1:10, 10)
chk_unique(x)

Checks: values

Description

Check that a vector contains only certain values.

Usage

chk_equals(x, val)

chk_values(x, ..., miss = getOption("testdat.miss"))

chk_range(x, min, max, ...)

chk_blank(x)
chk_equals(x, val)

chk_values(x, ..., miss = getOption("testdat.miss"))

chk_range(x, min, max, ...)

chk_blank(x)

Arguments

`x`	A vector to check.
`val`	A scalar value for the equality check.
`...`	Vectors of valid values.
`miss`	A vector of values to be treated as missing. The testdat.miss or testdat.miss_text option is used by default.
`min`	Minimum value for range check.
`max`	Maximum value for range check.

Value

A logical vector flagging records that have passed or failed the check.

Examples


x <- c(NA, 0, 1, 0.5, 0, NA, 99)
chk_blank(x) # Blank
chk_equals(x, 0) # Either blank or 0
chk_values(x, 0, 1) # Either blank, 0, 1, or 99
chk_range(x, 0, 1) # Either blank or in [0,1]
chk_range(x, 0, 1, 99) # Either blank, in [0,1], or equal to 99

x <- c(NA, 0, 1, 0.5, 0, NA, 99)
chk_blank(x) # Blank
chk_equals(x, 0) # Either blank or 0
chk_values(x, 0, 1) # Either blank, 0, 1, or 99
chk_range(x, 0, 1) # Either blank or in [0,1]
chk_range(x, 0, 1, 99) # Either blank, in [0,1], or equal to 99

Expectations: consistency

Description

These functions test whether multiple conditions coexist.

Usage

expect_cond(cond1, cond2, data = get_testdata())

expect_base(
  var,
  base,
  miss = getOption("testdat.miss"),
  missing_valid = FALSE,
  data = get_testdata()
)
expect_cond(cond1, cond2, data = get_testdata())

expect_base(
  var,
  base,
  miss = getOption("testdat.miss"),
  missing_valid = FALSE,
  data = get_testdata()
)

Arguments

`cond1`	<`data-masking`> First condition (antecedent) for consistency check.
`cond2`	<`data-masking`> Second condition (consequent) for consistency check.
`data`	A data frame to test. The global test data is used by default.
`var`	An unquoted column name to test.
`base`	<`data-masking`> The condition that determines which records should be non-missing.
`miss`	A vector of values to be treated as missing. The testdat.miss option is used by default.
`missing_valid`	Should missing values be treated as valid for records meeting the `base` condition? This allows 'one way' base checks. This is `FALSE` by default.

Value

⁠expect_*()⁠ functions are mainly called for their side effects. The expectation signals its result (e.g. "success", "failure"), which is logged by the current test reporter. In a non-testing context the expectation will raise an error with class expectation_failure if it fails.

Functions

expect_cond(): Checks the coexistence of two conditions. It can be read as "if cond1 then cond2".
expect_base(): A special case that checks missing data against a specified condition. It can be read as "if base then var not missing, if not base then var missing".

Examples

my_survey <- data.frame(
  resp_id = 1:5,
  q1a = c(0, 1, 0, 1, 0),
  q1b = c(NA, NA, NA, 1, 0), # Asked if q1a %in% 1
  q2a = c(90, 80, 60, 40, 90),
  q2b = c("", "", NA, "Some reason for low rating", "") # Asked if q2a < 50
)

# Check that q1b has a value if and only if q1a %in% 1
try(expect_base(q1b, q1a %in% 1, data = my_survey)) # Fails for resp_id 2 and 5

# Check that q2b has a value if and only if q2a < 50
expect_base(q2b, q2a < 50, data = my_survey)

# Check that if q1a %in% 0 then q2a > 50 (but not vice-versa)
expect_cond(q1a %in% 0, q2a > 50, data = my_survey)

my_survey <- data.frame(
  resp_id = 1:5,
  q1a = c(0, 1, 0, 1, 0),
  q1b = c(NA, NA, NA, 1, 0), # Asked if q1a %in% 1
  q2a = c(90, 80, 60, 40, 90),
  q2b = c("", "", NA, "Some reason for low rating", "") # Asked if q2a < 50
)

# Check that q1b has a value if and only if q1a %in% 1
try(expect_base(q1b, q1a %in% 1, data = my_survey)) # Fails for resp_id 2 and 5

# Check that q2b has a value if and only if q2a < 50
expect_base(q2b, q2a < 50, data = my_survey)

# Check that if q1a %in% 0 then q2a > 50 (but not vice-versa)
expect_cond(q1a %in% 0, q2a > 50, data = my_survey)

Expectations: comparisons

Description

These functions allow for comparison between two data frames.

Usage

expect_valmatch(
  data2,
  vars,
  by,
  not = FALSE,
  flt = TRUE,
  data = get_testdata()
)

expect_subset(data2, by = NULL, not = FALSE, flt = TRUE, data = get_testdata())
expect_valmatch(
  data2,
  vars,
  by,
  not = FALSE,
  flt = TRUE,
  data = get_testdata()
)

expect_subset(data2, by = NULL, not = FALSE, flt = TRUE, data = get_testdata())

Arguments

`data2`	The data frame to compare against.
`vars`	<`tidy-select`> A set of columns to test.
`by`	A character vector of columns to join by. See `dplyr::join()` for details.
`not`	Reverse the results of the check?
`flt`	<`data-masking`> A filter specifying a subset of the data frame to test.
`data`	A data frame to test. The global test data is used by default.

Details

expect_valmatch() compares the observations appearing in one data frame (data) to the same observations, as picked out by a key (by), in another data frame (data2). It fails if the selected columns (vars) aren't the same for those observations in both data frames.
expect_subset() compares one data frame (data) to another (data2) and fails if all of the observations in the first, as picked out by a key (by), do not appear in the second.

Value

Examples


df1 <- data.frame(
  id = 0:99,
  binomial = sample(0:1, 100, TRUE),
  even = abs(0:99%%2 - 1) * 0:99
)

df2 <- data.frame(
  id = 0:99,
  binomial = sample(0:1, 100, TRUE),
  odd = 0:99%%2 *0:99
)


# Check that same records 'succeeded' across data frames
try(expect_valmatch(df2, binomial, by = "id", data = df1))

# Check that all records in `df1`, as picked out by `id`, exist in `df2`
expect_subset(df2, by = "id", data = df1)

df1 <- data.frame(
  id = 0:99,
  binomial = sample(0:1, 100, TRUE),
  even = abs(0:99%%2 - 1) * 0:99
)

df2 <- data.frame(
  id = 0:99,
  binomial = sample(0:1, 100, TRUE),
  odd = 0:99%%2 *0:99
)


# Check that same records 'succeeded' across data frames
try(expect_valmatch(df2, binomial, by = "id", data = df1))

# Check that all records in `df1`, as picked out by `id`, exist in `df2`
expect_subset(df2, by = "id", data = df1)

Expectations: dates

Description

Test whether variables in a data frame conform to a given date format such as YYYYMMDD.

Usage

expect_date_yyyy(vars, flt = TRUE, data = get_testdata())

expect_date_yyyymm(vars, flt = TRUE, data = get_testdata())

expect_date_yyyymmdd(vars, flt = TRUE, data = get_testdata())
expect_date_yyyy(vars, flt = TRUE, data = get_testdata())

expect_date_yyyymm(vars, flt = TRUE, data = get_testdata())

expect_date_yyyymmdd(vars, flt = TRUE, data = get_testdata())

Arguments

`vars`	<`tidy-select`> A set of columns to test.
`flt`	<`data-masking`> A filter specifying a subset of the data frame to test.
`data`	A data frame to test. The global test data is used by default.

Value

Examples


sales <- data.frame(
  sale_id = 1:5,
  date = c("20200101", "20200101", "20200102", "20200103", "20220101"),
  quarter = c(202006, 202009, 202012, 20203, 20200101),
  published = c(1999, 19991, 21, 0001, 20200101)
)

try(expect_date_yyyymmdd(date, data = sales)) # Full date of sale valid
try(expect_date_yyyymm(quarter, data = sales)) # Quarters given as YYYYMM
try(expect_date_yyyy(published, data = sales)) # Publication years valid

sales <- data.frame(
  sale_id = 1:5,
  date = c("20200101", "20200101", "20200102", "20200103", "20220101"),
  quarter = c(202006, 202009, 202012, 20203, 20200101),
  published = c(1999, 19991, 21, 0001, 20200101)
)

try(expect_date_yyyymmdd(date, data = sales)) # Full date of sale valid
try(expect_date_yyyymm(quarter, data = sales)) # Quarters given as YYYYMM
try(expect_date_yyyy(published, data = sales)) # Publication years valid

Expectations: exclusivity

Description

expect_exclusive tests that vars are exclusive - that, if any one of vars is set to exc_val, no other column in vars or var_set is also set to exc_val.

Usage

expect_exclusive(vars, var_set, exc_val = 1, flt = TRUE, data = get_testdata())
expect_exclusive(vars, var_set, exc_val = 1, flt = TRUE, data = get_testdata())

Arguments

`vars`	<`tidy-select`> A set of columns to test.
`var_set`	<`tidy-select`> The full set of columns to check against. This should include all columns specified in the `vars` argument.
`exc_val`	The value that flags a variable as "selected" (default: `1`)
`flt`	<`data-masking`> A filter specifying a subset of the data frame to test.
`data`	A data frame to test. The global test data is used by default.

Details

This expectation is designed to check exclusivity in survey multiple response sets, where one response is only valid on its own.

See the example data set below:

No record should have q10_98, "None of the above", selected while also having any other response selected, so we refer to this as an "exclusive" response.
expect_exclusive() checks whether q10_98 "None of the above" or q10_99 "Don't know", the exclusive responses, have been selected alongside any other ⁠q10_*⁠ response.
The expectation fails, since the first record has both q10_1 and q10_98 selected.

Value

Examples


my_q_block <- data.frame(
  resp_id = 1:5, # Unique to respondent
  q10_1  = c(1, 1, 0, 0, 0),
  q10_2  = c(0, 1, 0, 0, 0),
  q10_3  = c(0, 0, 1, 0, 0),
  q10_98 = c(1, 0, 0, 1, 0), # None of the above
  q10_99 = c(0, 0, 0, 0, 1)  # Item not answered
)

# Make sure that if "None of the above" and "Item skipped" are selected
# none of the other question options are selected:
try(
expect_exclusive(
  c(q10_98, q10_99),
  starts_with("q10_"),
  data = my_q_block
)
)
my_q_block <- data.frame(
  resp_id = 1:5, # Unique to respondent
  q10_1  = c(1, 1, 0, 0, 0),
  q10_2  = c(0, 1, 0, 0, 0),
  q10_3  = c(0, 0, 1, 0, 0),
  q10_98 = c(1, 0, 0, 1, 0), # None of the above
  q10_99 = c(0, 0, 0, 0, 1)  # Item not answered
)

# Make sure that if "None of the above" and "Item skipped" are selected
# none of the other question options are selected:
try(
expect_exclusive(
  c(q10_98, q10_99),
  starts_with("q10_"),
  data = my_q_block
)
)

Expectations: functional dependency

Description

Test whether one set of variables functionally depend on another set of variables.

Usage

expect_depends(vars, on, flt = TRUE, data = get_testdata())
expect_depends(vars, on, flt = TRUE, data = get_testdata())

Arguments

`vars`	<`tidy-select`> A set of columns to test.
`on`	<`tidy-select`> A set of columns which `vars` are expected to depend on.
`flt`	<`data-masking`> A filter specifying a subset of the data frame to test.
`data`	A data frame to test. The global test data is used by default.

Details

One set of variables, X, functionally depends on another, Y, if and only if each value in Y corresponds to exactly one value in X. For instance, course_duration and course_topic functionally depend on course_code if each course_code corresponds to just one combination of course_duration and ⁠course topic⁠. That is, if two records have the same course_code then they must have the same course_duration and course_topic.

See the wikipedia page for more information.

Value

Examples


student_course <- data.frame(
  student_id = 1:5,
  course_code = c(1, 2, 1, 3, 4),
  course_duration = c(12, 12, 12, 12, 12),
  course_topic = c("Song", "Dance", "Song", "Painting", "Pottery")
)

# Check that each `course_code` corresponds to exactly one combination of
# `course_duration` and `course_topic`
expect_depends(
  c(course_duration, course_topic),
  on = course_code,
  data = student_course
)
student_course <- data.frame(
  student_id = 1:5,
  course_code = c(1, 2, 1, 3, 4),
  course_duration = c(12, 12, 12, 12, 12),
  course_topic = c("Song", "Dance", "Song", "Painting", "Pottery")
)

# Check that each `course_code` corresponds to exactly one combination of
# `course_duration` and `course_topic`
expect_depends(
  c(course_duration, course_topic),
  on = course_code,
  data = student_course
)

Create an expectation from a check function

Description

expect_make() creates an expectation from a vectorised checking function to allow simple generation of domain specific data checks.

Usage

expect_make(
  func,
  func_desc = NULL,
  vars = FALSE,
  all = TRUE,
  env = caller_env()
)
expect_make(
  func,
  func_desc = NULL,
  vars = FALSE,
  all = TRUE,
  env = caller_env()
)

Arguments

`func`	A function whose first argument takes a vector to check, and returns a logical vector of the same length with the results.
`func_desc`	A character function description to use in the expectation failure message.
`vars`	Included for backwards compatibility only.
`all`	Function to use to combine results for each vector.
`env`	The parent environment of the function, defaults to the calling environment of `expect_make()`.

Value

An ⁠expect_*()⁠ style function.

Examples

# Create a custom check
chk_binary <- function(x) {
  suppressWarnings(as.integer(x) %in% 0:1)
}

# Create custom expectation function
expect_binary <- expect_make(chk_binary)

# Validate a data frame
try(expect_binary(vs, data = mtcars))
try(expect_binary(cyl, data = mtcars))

# Create a custom check
chk_binary <- function(x) {
  suppressWarnings(as.integer(x) %in% 0:1)
}

# Create custom expectation function
expect_binary <- expect_make(chk_binary)

# Validate a data frame
try(expect_binary(vs, data = mtcars))
try(expect_binary(cyl, data = mtcars))

Expectations: generic helpers

Description

These functions allow for testing of multiple columns (vars) of a data frame (data), with an optional filter (flt), using an arbitrary function (func).

Usage

expect_all(
  vars,
  func,
  flt = TRUE,
  data = get_testdata(),
  args = list(),
  func_desc = NULL
)

expect_any(
  vars,
  func,
  flt = TRUE,
  data = get_testdata(),
  args = list(),
  func_desc = NULL
)
expect_all(
  vars,
  func,
  flt = TRUE,
  data = get_testdata(),
  args = list(),
  func_desc = NULL
)

expect_any(
  vars,
  func,
  flt = TRUE,
  data = get_testdata(),
  args = list(),
  func_desc = NULL
)

Arguments

`vars`	<`tidy-select`> A set of columns to test.
`func`	A function to use for testing that takes a vector as the first argument and returns a logical vector of the same length showing whether an element passed or failed.
`flt`	<`data-masking`> A filter specifying a subset of the data frame to test.
`data`	A data frame to test. The global test data is used by default.
`args`	A named list of arguments to pass to `func`.
`func_desc`	A human friendly description of `func` to use in the expectation failure message.

Details

expect_allany() tests the columns in vars to see whether func returns TRUE for each of them, and combines the results for each row using the function in allany. Both expect_all() and expect_any() are wrappers around expect_allany().
expect_all() tests the vars to see whether func returns TRUE for all of them (i.e. whether the conjunction of results of applying func to each of the vars is TRUE).
expect_any() tests the vars to see whether func returns TRUE for any of them (i.e. whether the disjunction of the results of applying func to each of the vars is TRUE).

Value

Examples

# Check that every 4-cylinder car has an engine displacement of < 100 cubic
# inches *AND* < 100 horsepower
try(
expect_all(
  vars = c(disp, hp),
  func = chk_range,
  flt = (cyl == 4),
  args = list(min = 0, max = 100),
  data = mtcars
)
)

# Check that every 4-cylinder car has an engine displacement of < 100 cubic
# inches *OR* < 100 horsepower
try(
expect_any(
  vars = c(disp, hp),
  func = chk_range,
  flt = (cyl == 4),
  args = list(min = 0, max = 100),
  data = mtcars
)
)

# Check that all variables are numeric:
try(expect_all(
  vars = everything(),
  func = is.numeric,
  data = iris
))

# Check that every 4-cylinder car has an engine displacement of < 100 cubic
# inches *AND* < 100 horsepower
try(
expect_all(
  vars = c(disp, hp),
  func = chk_range,
  flt = (cyl == 4),
  args = list(min = 0, max = 100),
  data = mtcars
)
)

# Check that every 4-cylinder car has an engine displacement of < 100 cubic
# inches *OR* < 100 horsepower
try(
expect_any(
  vars = c(disp, hp),
  func = chk_range,
  flt = (cyl == 4),
  args = list(min = 0, max = 100),
  data = mtcars
)
)

# Check that all variables are numeric:
try(expect_all(
  vars = everything(),
  func = is.numeric,
  data = iris
))

Get/set test data

Description

A global test data set is used to avoid having to re-specify the testing data frame in every test. These functions get and set the global data or set the data for the current context.

Usage

set_testdata(data, quosure = TRUE)

get_testdata()

with_testdata(data, code, quosure = TRUE)

data %E>% code
set_testdata(data, quosure = TRUE)

get_testdata()

with_testdata(data, code, quosure = TRUE)

data %E>% code

Arguments

data

Data frame to be used.

quosure

If TRUE, the default, the data frame is stored as a quosure and lazily evaluated when get_testdata() is called, so get_testdata() will return the current state of the data frame.

If FALSE, the data frame will be copied and get_testdata() will return the state of the data frame at the time set_testdata() was called.

code

Code to execute with the test data set to data.

Value

set_testdata() invisibly returns the previous test data. The test data is returned as it was stored - if it was stored with quosure = TRUE it will be returned as a quosure.
get_testdata() returns the current test data frame.
with_testdata() and the test data pipe ⁠%E>%⁠ invisibly return the input data for easy piping.

Examples

set_testdata(mtcars)
head(get_testdata())

with_testdata(iris, {
  x <- get_testdata()
  print(head(x))
})

mtcars %E>%
  expect_base(mpg, TRUE) %E>%
  expect_range(carb, 1, 8)
set_testdata(mtcars)
head(get_testdata())

with_testdata(iris, {
  x <- get_testdata()
  print(head(x))
})

mtcars %E>%
  expect_base(mpg, TRUE) %E>%
  expect_range(carb, 1, 8)

Expectations: labels

Description

Test whether variables in a data frame are labelled in a given way.

Usage

expect_labels(
  vars,
  val_labels = NULL,
  var_label = NULL,
  flt = TRUE,
  data = get_testdata()
)
expect_labels(
  vars,
  val_labels = NULL,
  var_label = NULL,
  flt = TRUE,
  data = get_testdata()
)

Arguments

`vars`	<`tidy-select`> A set of columns to test.
`val_labels`	What value label check should be performed? One of: A character vector of expected value labels. A named vector of expected label-value pairs. `TRUE` to test for the presence of value labels in general. `FALSE` to test for the absence of value labels. `NULL` to ignore value labels when checking.
`var_label`	What variable label check should be performed? One of: A character vector of expected variable labels. `TRUE` to test for the presence of a variable labels. `FALSE` to test for the absence of a variable labels. `NULL` to ignore the variable label when checking.
`flt`	<`data-masking`> A filter specifying a subset of the data frame to test.
`data`	A data frame to test. The global test data is used by default.

Value

Examples


df <- data.frame(
  x = labelled::labelled(c("M", "M", "F"), c(Male = "M", Female = "F"), "Sex"),
  y = labelled::labelled(c("M", "M", "F"), c(Male = "M", Female = "F", Other = "X")),
  z = c("M", "M", "F")
)

# Check for a value-label pairing
try(expect_labels(x, c(Male = "M"), data = df))

# Check that two variables have the same values
expect_labels(x, labelled::val_labels(df$y), data = df) # N.B. This passes!

# Check for the presence of a particular label
try(expect_labels(x, "Male", data = df))
expect_labels(x, var_label = "Sex", data = df)

# Check that a variable is labelled at all
try(expect_labels(z, val_labels = TRUE, data = df))
try(expect_labels(z, var_label = TRUE, data = df))

# Check that a variable isn't labelled
expect_labels(z, val_labels = FALSE, data = df)
expect_labels(z, var_label = FALSE, data = df)

df <- data.frame(
  x = labelled::labelled(c("M", "M", "F"), c(Male = "M", Female = "F"), "Sex"),
  y = labelled::labelled(c("M", "M", "F"), c(Male = "M", Female = "F", Other = "X")),
  z = c("M", "M", "F")
)

# Check for a value-label pairing
try(expect_labels(x, c(Male = "M"), data = df))

# Check that two variables have the same values
expect_labels(x, labelled::val_labels(df$y), data = df) # N.B. This passes!

# Check for the presence of a particular label
try(expect_labels(x, "Male", data = df))
expect_labels(x, var_label = "Sex", data = df)

# Check that a variable is labelled at all
try(expect_labels(z, val_labels = TRUE, data = df))
try(expect_labels(z, var_label = TRUE, data = df))

# Check that a variable isn't labelled
expect_labels(z, val_labels = FALSE, data = df)
expect_labels(z, var_label = FALSE, data = df)

Output `ListReporter` results in Excel format

Description

Output formatted ListReporter results to an Excel workbook using openxlsx. The workbook consists of a summary sheet showing aggregated results for each context, and one sheet per context showing details of each unsuccessful test.

Usage

output_results_excel(results, file)
output_results_excel(results, file)

Arguments

`results`	An object of class `testthat_results`, e.g. output from `test_dir()` or `test_file()`.
`file`	Output file name

Value

The return value of openxlsx::saveWorkbook().

Examples

## Not run: 
# Output the results from running all tests in a directory
x <- test_dir(".")
output_results_excel(x, "Test results.xlsx")

## End(Not run)
## Not run: 
# Output the results from running all tests in a directory
x <- test_dir(".")
output_results_excel(x, "Test results.xlsx")

## End(Not run)

Expectations: patterns

Description

Test whether variables in a data frame conform to a given pattern.

Usage

expect_regex(vars, pattern, flt = TRUE, data = get_testdata())

expect_max_length(vars, len, flt = TRUE, data = get_testdata())
expect_regex(vars, pattern, flt = TRUE, data = get_testdata())

expect_max_length(vars, len, flt = TRUE, data = get_testdata())

Arguments

`vars`	<`tidy-select`> A set of columns to test.
`pattern`	A str_detect() pattern to match.
`flt`	<`data-masking`> A filter specifying a subset of the data frame to test.
`data`	A data frame to test. The global test data is used by default.
`len`	Maximum string length.

Value

Examples


sales <- data.frame(
  sale_id = 1:5,
  item_code = c("a_1", "b_2", "c_2", NA, "NULL")
)

try(expect_regex(item_code, "[a-z]_[0-9]", data = sales)) # Codes match regex
try(expect_max_length(item_code,  3, data = sales)) # Code width <= 3

sales <- data.frame(
  sale_id = 1:5,
  item_code = c("a_1", "b_2", "c_2", NA, "NULL")
)

try(expect_regex(item_code, "[a-z]_[0-9]", data = sales)) # Codes match regex
try(expect_max_length(item_code,  3, data = sales)) # Code width <= 3

Expectations: proportions

Description

These test the proportion of data in a data frame satisfying some condition. The generic functions, expect_prop_lte() and expect_prop_gte(), can be used with any arbitrary function. The ⁠chk_*()⁠ functions, like chk_values(), are useful in this regard.

Usage

expect_prop_lte(
  var,
  func,
  prop,
  flt = TRUE,
  data = get_testdata(),
  args = list(),
  func_desc = NULL
)

expect_prop_gte(
  var,
  func,
  prop,
  flt = TRUE,
  data = get_testdata(),
  args = list(),
  func_desc = NULL
)

expect_prop_nmiss(
  var,
  prop,
  miss = getOption("testdat.miss"),
  flt = TRUE,
  data = get_testdata()
)

expect_prop_values(var, prop, ..., flt = TRUE, data = get_testdata())
expect_prop_lte(
  var,
  func,
  prop,
  flt = TRUE,
  data = get_testdata(),
  args = list(),
  func_desc = NULL
)

expect_prop_gte(
  var,
  func,
  prop,
  flt = TRUE,
  data = get_testdata(),
  args = list(),
  func_desc = NULL
)

expect_prop_nmiss(
  var,
  prop,
  miss = getOption("testdat.miss"),
  flt = TRUE,
  data = get_testdata()
)

expect_prop_values(var, prop, ..., flt = TRUE, data = get_testdata())

Arguments

`var`	An unquoted column name to test.
`func`	A function to use for testing that takes a vector as the first argument and returns a logical vector of the same length showing whether an element passed or failed.
`prop`	The proportion of the data frame expected to satisfy the condition.
`flt`	<`data-masking`> A filter specifying a subset of the data frame to test.
`data`	A data frame to test. The global test data is used by default.
`args`	A named list of arguments to pass to `func`.
`func_desc`	A human friendly description of `func` to use in the expectation failure message.
`miss`	A vector of values to be treated as missing. The testdat.miss option is used by default.
`...`	Vectors of valid values.

Details

Given the use of quasi-quotation within these functions, to make a new functions using one of the generics such as expect_prop_gte() one must defuse the var argument using the embracing operator {{ }}. See the examples sections for an example.

Value

Examples

sales <- data.frame(
  sale_id = 1:5,
  date = c("20200101", "20200101", "20200102", "20200103", "2020003"),
  sale_price = c(10, 20, 30, 40, -1),
  book_title = c(
    "Phenomenology of Spirit",
    NA,
    "Critique of Practical Reason",
    "Spirit of Trust",
    "Empiricism and the Philosophy of Mind"
  ),
  stringsAsFactors = FALSE
)

# Create a custom expectation
expect_prop_length <- function(var, len, prop, data) {
  expect_prop_gte(
    var = {{var}}, # Notice the use of the embracing operator
    func = chk_max_length,
    prop = prop,
    data = data,
    args = list(len = len),
    func_desc = "length_check"
  )
}

# Use it to check that dates are mostly <= 8 char wide
expect_prop_length(date, 8, 0.9, sales)

# Check price values mostly between 0 and 100
try(expect_prop_values(sale_price, 0.9, 1:100, data = sales))

sales <- data.frame(
  sale_id = 1:5,
  date = c("20200101", "20200101", "20200102", "20200103", "2020003"),
  sale_price = c(10, 20, 30, 40, -1),
  book_title = c(
    "Phenomenology of Spirit",
    NA,
    "Critique of Practical Reason",
    "Spirit of Trust",
    "Empiricism and the Philosophy of Mind"
  ),
  stringsAsFactors = FALSE
)

# Create a custom expectation
expect_prop_length <- function(var, len, prop, data) {
  expect_prop_gte(
    var = {{var}}, # Notice the use of the embracing operator
    func = chk_max_length,
    prop = prop,
    data = data,
    args = list(len = len),
    func_desc = "length_check"
  )
}

# Use it to check that dates are mostly <= 8 char wide
expect_prop_length(date, 8, 0.9, sales)

# Check price values mostly between 0 and 100
try(expect_prop_values(sale_price, 0.9, 1:100, data = sales))

Expectations: text

Description

Test whether variables in a data frame contain common NULL placeholders.

Usage

expect_text_miss(
  vars,
  miss = getOption("testdat.miss_text"),
  flt = TRUE,
  data = get_testdata()
)

expect_text_nmiss(
  vars,
  miss = getOption("testdat.miss_text"),
  flt = TRUE,
  data = get_testdata()
)
expect_text_miss(
  vars,
  miss = getOption("testdat.miss_text"),
  flt = TRUE,
  data = get_testdata()
)

expect_text_nmiss(
  vars,
  miss = getOption("testdat.miss_text"),
  flt = TRUE,
  data = get_testdata()
)

Arguments

`vars`	<`tidy-select`> A set of columns to test.
`miss`	A vector of values to be treated as missing. The testdat.miss or testdat.miss_text option is used by default.
`flt`	<`data-masking`> A filter specifying a subset of the data frame to test.
`data`	A data frame to test. The global test data is used by default.

Value

Examples


sales <- data.frame(
  sale_id = 1:5,
  date = c("20200101", "null", "20200102", "20200103", "null"),
  sale_price = c(10, -1, 30, 40, -1)
)

# Dates not missing
try(expect_text_nmiss(date, data = sales))

# Date missing if price negative
try(expect_text_miss(date, flt = sale_price %in% -1, data = sales))

sales <- data.frame(
  sale_id = 1:5,
  date = c("20200101", "null", "20200102", "20200103", "null"),
  sale_price = c(10, -1, 30, 40, -1)
)

# Dates not missing
try(expect_text_nmiss(date, data = sales))

# Date missing if price negative
try(expect_text_miss(date, flt = sale_price %in% -1, data = sales))

Expectations: uniqueness

Description

These functions test variables for uniqueness.

Usage

expect_unique(
  vars,
  exclude = getOption("testdat.miss"),
  flt = TRUE,
  data = get_testdata()
)

expect_unique_across(
  vars,
  exclude = getOption("testdat.miss"),
  flt = TRUE,
  data = get_testdata()
)

expect_unique_combine(
  vars,
  exclude = getOption("testdat.miss"),
  flt = TRUE,
  data = get_testdata()
)
expect_unique(
  vars,
  exclude = getOption("testdat.miss"),
  flt = TRUE,
  data = get_testdata()
)

expect_unique_across(
  vars,
  exclude = getOption("testdat.miss"),
  flt = TRUE,
  data = get_testdata()
)

expect_unique_combine(
  vars,
  exclude = getOption("testdat.miss"),
  flt = TRUE,
  data = get_testdata()
)

Arguments

`vars`	<`tidy-select`> A set of columns to test.
`exclude`	a vector of values to exclude from uniqueness check. The testdat.miss option is used by default. To include all values, set `exclude = NULL`.
`flt`	<`data-masking`> A filter specifying a subset of the data frame to test.
`data`	A data frame to test. The global test data is used by default.

Details

expect_unique() tests a set of columns (vars) and fails if the combined columns do not uniquely identify each row.
expect_unique_across() tests a set of columns (vars) and fails if each row does not have unique values in each column.
expect_unique_combine() tests a set of columns (vars) and fails if any value appears more than once across all of them.

By default the uniqueness check excludes missing values (as specified by the testdat.miss option). Setting exclude = NULL will include all values.

Value

Examples


student_fruit_preferences <- data.frame(
  student_id = c(1:5, NA, NA),
  apple = c(1, 1, 1, 1, 99, NA, NA),
  orange = c(2, 3, 2, 3, 99, NA, NA),
  banana = c(3, 2, 3, 2, 99, NA, NA),
  phone1 = c(123, 456, 789, 987, 654, NA, NA),
  phone2 = c(345, 678, 987, 567, 000, NA, NA)
)

# Check that key is unique, excluding NAs by default
expect_unique(student_id, data = student_fruit_preferences)

# Check that key is unique, including NAs
try(expect_unique(student_id, exclude = NULL, data = student_fruit_preferences))

# Check each fruit has unique preference number
try(
expect_unique_across(
  c(apple, orange, banana),
  data = student_fruit_preferences
)
)

# Check each fruit has unique preference number, allowing multiple 99 (item
# skipped) codes
expect_unique_across(
  c(apple, orange, banana),
  exclude = c(99, NA), data = student_fruit_preferences
)

# Check that each phone number appears at most once
try(expect_unique_combine(c(phone1, phone2), data = student_fruit_preferences))

student_fruit_preferences <- data.frame(
  student_id = c(1:5, NA, NA),
  apple = c(1, 1, 1, 1, 99, NA, NA),
  orange = c(2, 3, 2, 3, 99, NA, NA),
  banana = c(3, 2, 3, 2, 99, NA, NA),
  phone1 = c(123, 456, 789, 987, 654, NA, NA),
  phone2 = c(345, 678, 987, 567, 000, NA, NA)
)

# Check that key is unique, excluding NAs by default
expect_unique(student_id, data = student_fruit_preferences)

# Check that key is unique, including NAs
try(expect_unique(student_id, exclude = NULL, data = student_fruit_preferences))

# Check each fruit has unique preference number
try(
expect_unique_across(
  c(apple, orange, banana),
  data = student_fruit_preferences
)
)

# Check each fruit has unique preference number, allowing multiple 99 (item
# skipped) codes
expect_unique_across(
  c(apple, orange, banana),
  exclude = c(99, NA), data = student_fruit_preferences
)

# Check that each phone number appears at most once
try(expect_unique_combine(c(phone1, phone2), data = student_fruit_preferences))

Expectations: values

Description

Test whether variables in a data frame contain only certain values.

Usage

expect_values(
  vars,
  ...,
  miss = getOption("testdat.miss"),
  flt = TRUE,
  data = get_testdata()
)

expect_range(vars, min, max, ..., flt = TRUE, data = get_testdata())
expect_values(
  vars,
  ...,
  miss = getOption("testdat.miss"),
  flt = TRUE,
  data = get_testdata()
)

expect_range(vars, min, max, ..., flt = TRUE, data = get_testdata())

Arguments

`vars`	<`tidy-select`> A set of columns to test.
`...`	Vectors of valid values.
`miss`	A vector of values to be treated as missing. The testdat.miss or testdat.miss_text option is used by default.
`flt`	<`data-masking`> A filter specifying a subset of the data frame to test.
`data`	A data frame to test. The global test data is used by default.
`min`	Minimum value for range check.
`max`	Maximum value for range check.

Value

Examples


sales <- data.frame(
  sale_id = 1:5,
  date = c("20200101", "20200101", "20200102", "20200103", "20220101"),
  sale_price = c(10, 20, 30, 40, -1)
)

try(expect_values(date, 20000000:20210000, data = sales)) # Dates between 2000 and 2021
try(expect_range(sale_price, min = 0, max = Inf, data = sales)) # Prices non-negative

sales <- data.frame(
  sale_id = 1:5,
  date = c("20200101", "20200101", "20200102", "20200103", "20220101"),
  sale_price = c(10, 20, 30, 40, -1)
)

try(expect_values(date, 20000000:20210000, data = sales)) # Dates between 2000 and 2021
try(expect_range(sale_price, min = 0, max = Inf, data = sales)) # Prices non-negative

Package 'testdat'

Help Index

Checks: dates

Description

Usage

Arguments

Value

See Also

Examples

Checks: dummy

Description

Usage

Arguments

Value

See Also

Examples

Checks: data frame helpers

Description

Usage

Arguments

Details

Value

See Also

Examples

Checks: labels

Description

Usage

Arguments

Value

See Also

Examples

Checks: patterns

Description

Usage

Arguments

Value

See Also

Examples

Checks: text

Description

Usage

Arguments

Value

See Also

Examples

Checks: uniqueness

Description

Usage

Arguments

Value

See Also

Examples

Checks: values

Description

Usage

Arguments

Value

See Also

Examples

Expectations: consistency

Description

Usage

Arguments

Value

Functions

See Also

Examples

Expectations: comparisons

Description

Usage

Arguments

Details

Value

See Also

Examples

Expectations: dates

Description

Usage

Arguments

Value

Output `ListReporter` results in Excel format