1 - Add metadata to datasets of individual human records

Appending appropriate metadata to datasets of individual unit records can facilitate partial automation of some modelling tasks. This tutorial describes how a module from the youthvars R package can help you to add metadata to a youth mental health dataset so that it can be more readily used by other readyforwhatsnext modules.

This below section renders a vignette article from the youthvars library. You can use the following links to:

Note: This vignette is illustrated with fake data. The dataset explored in this example should not be used to inform decision-making.

library(ready4)
library(youthvars)

Youthvars provides two ready4 framework modules - YouthvarsProfile and YouthvarsSeries that form part of the ready4 economic model of youth mental health. The ready4 modules in youthvars extend the Ready4useDyad module and can be used to help describe key structural properties of youth mental health datasets.

Ingest data

To start we ingest X, a Ready4useDyad (dataset and data dictionary pair) that we can download from a remote repository.

X <- ready4use::Ready4useRepos(dv_nm_1L_chr = "fakes",
                               dv_ds_nm_1L_chr = "https://doi.org/10.7910/DVN/W95KED",
                               dv_server_1L_chr = "dataverse.harvard.edu") %>%
  ingest(fls_to_ingest_chr = "ymh_clinical_dyad_r4",
         metadata_1L_lgl = F)

Add metadata

If a dataset is cross-sectional or we wish to treat it as if it were (i.e., where data collection rounds are ignored) we can create Y, an instance of the YouthvarsProfile module, to add minimal metadata (the name of the unique identifier variable).

Y <- YouthvarsProfile(a_Ready4useDyad = X, id_var_nm_1L_chr = "fkClientID")

If the temporal dimension of the dataset is important, it may be therefore preferable to instead transform X into a YouthvarsSeries module instance. YouthvarsSeries objects contain all of the fields of YouthvarsProfile objects, but also include additional fields that are specific for longitudinal datasets (e.g. timepoint_var_nm_1L_chr and timepoint_vals_chr that respectively specify the data-collection timepoint variable name and values and participation_var_1L_chr that specifies the desired name of a yet to be created variable that will summarise the data-collection timepoints for which each unit record supplied data).

Z <- YouthvarsSeries(a_Ready4useDyad = X,
                     id_var_nm_1L_chr = "fkClientID",
                     participation_var_1L_chr = "participation",
                     timepoint_vals_chr = c("Baseline","Follow-up"),
                     timepoint_var_nm_1L_chr = "round")

YouthvarsProfile methods

Inspect data

We can now specify the variables that we would like to prepare descriptive statistics for by using the renew method. The variables to be profiled are specified in the profile_chr argument, the number of decimal digits (default = 3) of numeric values in the summary tables to be generated can be specified with nbr_of_digits_1L_int.

Y <- renew(Y, nbr_of_digits_1L_int = 2L, profile_chr = c("d_age","d_sexual_ori_s","d_studying_working"))

We can now view the descriptive statistics we created in the previous step.

Y %>%
  exhibit(profile_idx_int = 1L, scroll_box_args_ls = list(width = "100%"))
Descriptive summary
(N = 1711)
Age Mean (SD) 17.64 (3.09)
Median (Q1, Q3) 18.00 (15.00, 20.00)
Min - Max 12.00 25.00
Missing 0.00
Sexual orientation Heterosexual 1178.00 (71.74%)
Other 464.00 (28.26%)
Missing 69.00
Education and employment status Not studying or working 311.00 (18.75%)
Studying and working 451.00 (27.19%)
Studying only 572.00 (34.48%)
Working only 325.00 (19.59%)
Missing 52.00

We can also plot the distributions of selected variables in our dataset.

depict(Y, var_nms_chr = c("c_sofas"), labels_chr = c("SOFAS"))
SOFAS total scores

SOFAS total scores

YouthvarsSeries methods

Validate data

To explore longitudinal data we need to first use the ratify method to ensure that Z has been appropriately configured for methods examining datasets reporting measures at two timepoints.

Z <- ratify(Z,
            type_1L_chr = "two_timepoints")

Inspect data

We can now specify the variables that we would like to prepare descriptive statistics for using the renew method. The variables to be profiled are specified in arguments beginning with “compare_”. Use compare_ptcpn_chr to compare variables based on whether cases reported data at one or both timepoints and compare_by_time_chr to compare the summary statistics of variables by timepoints, e.g at baseline and follow-up. If you wish these comparisons to report p values, then use the compare_ptcpn_with_test_chr and compare_by_time_with_test_chr arguments.

Z <- renew(Z,
           compare_by_time_chr = c("d_age","d_sexual_ori_s","d_studying_working"),
           compare_by_time_with_test_chr = c("k6_total", "phq9_total", "bads_total"),
           compare_ptcpn_with_test_chr = c("k6_total", "phq9_total", "bads_total")) 

The tables generated in the preceding step can be inspected using the exhibit method.

Z %>%
  exhibit(profile_idx_int = 1L,
          scroll_box_args_ls = list(width = "100%"))
Outcomes by data completeness
Baseline only
Baseline and follow-up
(N = 1068) (N = 643) p
Kessler Psychological Distress Scale (6 Dimension) Mean (SD) 12.153 (5.409) 11.069 (5.778) 0.001
Median (Q1, Q3) 12.000 (8.000, 16.000) 11.000 (7.000, 15.000) 0.001
Min - Max 0.000 24.000 0.000 24.000 0.001
Missing 0.000 3.000 0.001
Patient Health Questionnaire Mean (SD) 12.632 (6.086) 11.194 (6.434) 0.000
Median (Q1, Q3) 13.000 (8.000, 17.000) 11.000 (6.000, 16.000) 0.000
Min - Max 0.000 27.000 0.000 27.000 0.000
Missing 1.000 5.000 0.000
Behavioural Activation for Depression Scale Mean (SD) 79.814 (26.478) 83.571 (25.809) 0.010
Median (Q1, Q3) 79.000 (62.000, 95.250) 84.000 (66.000, 101.000) 0.010
Min - Max 0.000 150.000 0.000 150.000 0.010
Missing 1.000 10.000 0.010
Z %>%
  exhibit(profile_idx_int = 2L,
          scroll_box_args_ls = list(width = "100%"))
Outcomes by data collection round
Baseline
Follow-up
(N = 1068) (N = 643)
Age Mean (SD) 17.555 (3.090) 17.770 (3.091)
Median (Q1, Q3) 17.000 (15.000, 20.000) 18.000 (16.000, 20.000)
Min - Max 12.000 25.000 12.000 25.000
Missing 0.000 0.000
Sexual orientation Heterosexual 738.000 (71.860%) 440.000 (71.545%)
Other 289.000 (28.140%) 175.000 (28.455%)
Missing 41.000 28.000
Education and employment status Not studying or working 159.000 (15.347%) 152.000 (24.398%)
Studying and working 305.000 (29.440%) 146.000 (23.435%)
Studying only 405.000 (39.093%) 167.000 (26.806%)
Working only 167.000 (16.120%) 158.000 (25.361%)
Missing 32.000 20.000
Z %>%
  exhibit(profile_idx_int = 3L,
          scroll_box_args_ls = list(width = "100%"))
Outcomes by data collection round (with p values)
Baseline
Follow-up
(N = 1068) (N = 643) p
Kessler Psychological Distress Scale (6 Dimension) Mean (SD) 12.082 (5.603) 10.100 (5.665) 0.000
Median (Q1, Q3) 12.000 (8.000, 16.000) 10.000 (6.000, 14.000) 0.000
Min - Max 0.000 24.000 0.000 24.000 0.000
Missing 1.000 2.000 0.000
Patient Health Questionnaire Mean (SD) 12.646 (6.230) 9.736 (6.210) 0.000
Median (Q1, Q3) 13.000 (8.000, 17.000) 10.000 (5.000, 14.000) 0.000
Min - Max 0.000 27.000 0.000 27.000 0.000
Missing 4.000 2.000 0.000
Behavioural Activation for Depression Scale Mean (SD) 78.429 (25.608) 89.615 (25.205) 0.000
Median (Q1, Q3) 78.000 (61.000, 95.000) 88.000 (73.000, 106.000) 0.000
Min - Max 0.000 150.000 0.000 150.000 0.000
Missing 7.000 4.000 0.000

The depict method can create plots, comparing numeric variables by timepoint.

depict(Z,
       type_1L_chr = "by_time",
       var_nms_chr = c("c_sofas"),
       label_fill_1L_chr = "Time",#
       labels_chr = c("SOFAS"),#
       y_label_1L_chr = "")
SOFAS total scores by data collection round

SOFAS total scores by data collection round

Share data

If and only if the dataset you are working with is appropriate for public dissemination (e.g. is synthetic data), you can use the following workflow for sharing it. We can share the dataset we created for this example using the share method, specifying the repository to which we wish to publish the dataset (and for which we have write permissions) in a (Ready4useRepos object).

A <- Ready4useRepos(gh_repo_1L_chr = "ready4-dev/youthvars", # Replace with your repository 
                          gh_tag_1L_chr = "Documentation_0.0"), # (need write permissions).
A <- share(A,
           obj_to_share_xx = Z,
           fl_nm_1L_chr = "ymh_YouthvarsSeries")

Z is now available for download as the file ymh_YouthvarsSeries.RDS from the “Documentation_0.0” release of the youthvars package.

2 - Validate variable total scores

Vector based classes can be used to help validate variable values. This tutorial describes how to do that with sub-module classes exported as part of the youthvars R package.

This below section renders a vignette article from the youthvars library. You can use the following links to:

Variable classes and data integrity

The youthvars package includes a number of ready4 framework sub-module classes that form part of the ready4 economic model of youth mental health. The primary use of youthvars sub-modules is to quality assure the variables used in model input and output datasets by:

  1. facilitating automated data integrity checks that verify no impermissible values (e.g. utility scores greater than one) are present in source data, transformed data or results; and
  2. implementing rules-based automated selection and application of appropriate methods for each dataset variable.

Included sub-module classes

The initial set of sub-module classes included in the youthvars package are one class for Assessment of Quality of Life (Adolescent) health utility and one for each of the predictors used in the utility prediction algorithms included in the related youthu package.

Assessment of Quality of Life Six Dimension (Adolescent) Health Utility

The youthvars_aqol6d_adol class is defined for numeric vectors with a minimum value of 0.03 and maximum value of 1.0.

youthvars_aqol6d_adol(0.4)
#> [1] 0.4
#> attr(,"class")
#> [1] "youthvars_aqol6d_adol" "numeric"
youthvars_aqol6d_adol(c(0.03,0.2,1))
#> [1] 0.03 0.20 1.00
#> attr(,"class")
#> [1] "youthvars_aqol6d_adol" "numeric"

Non numeric objects and values outside these ranges will produce errors.

youthvars_aqol6d_adol("0.5")
#> Error in make_new_youthvars_aqol6d_adol(x): is.numeric(x) is not TRUE
youthvars_aqol6d_adol(-0.1)
#> Error: All non-missing values in valid youthvars_aqol6d_adol object must be greater than or equal to 0.03.
youthvars_aqol6d_adol(1.2)
#> Error: All non-missing values in valid youthvars_aqol6d_adol object must be less than or equal to 1.

Child Health Utility Nine Dimension - Australian Adolescent Scoring

The youthvars_chu9d_adolaus class is defined for numeric vectors with a minimum value of -0.2118 and maximum value of 1.0.

youthvars_chu9d_adolaus(0.4)
#> [1] 0.4
#> attr(,"class")
#> [1] "youthvars_chu9d_adolaus" "numeric"
youthvars_chu9d_adolaus(c(0.03,0.2,1))
#> [1] 0.03 0.20 1.00
#> attr(,"class")
#> [1] "youthvars_chu9d_adolaus" "numeric"

Non numeric objects and values outside these ranges will produce errors.

youthvars_chu9d_adolaus("0.5")
#> Error in make_new_youthvars_chu9d_adolaus(x): is.numeric(x) is not TRUE
youthvars_chu9d_adolaus(-0.3)
#> Error: All non-missing values in valid youthvars_chu9d_adolaus object must be greater than or equal to -0.2118.
youthvars_chu9d_adolaus(1.2)
#> Error: All non-missing values in valid youthvars_chu9d_adolaus object must be less than or equal to 1.

Behavioural Activation for Depression Scale (BADS)

The youthvars_bads class is defined for integer vectors with a minimum value of 0 and maximum value of 150.

youthvars_bads(143L)
#> [1] 143
#> attr(,"class")
#> [1] "youthvars_bads" "integer"
youthvars_bads(as.integer(c(1,15,150)))
#> [1]   1  15 150
#> attr(,"class")
#> [1] "youthvars_bads" "integer"

Non-integers and values outside these ranges will produce errors.

youthvars_bads(22.5)
#> Error in make_new_youthvars_bads(x): is.integer(x) is not TRUE
youthvars_bads(-1L)
#> Error: All non-missing values in valid youthvars_bads object must be greater than or equal to 0.
youthvars_bads(160L)
#> Error: All non-missing values in valid youthvars_bads object must be less than or equal to 150.

Generalised Anxiety Disorder Scale (GAD-7)

The youthvars_gad7 class is defined for integer vectors with a minimum value of 0 and a maximum value of 21.

youthvars_gad7(15L)
#> [1] 15
#> attr(,"class")
#> [1] "youthvars_gad7" "integer"
youthvars_gad7(as.integer(c(0,14,21)))
#> [1]  0 14 21
#> attr(,"class")
#> [1] "youthvars_gad7" "integer"

Non-integers and values outside these ranges will produce errors.

youthvars_gad7(14.6)
#> Error in make_new_youthvars_gad7(x): is.integer(x) is not TRUE
youthvars_gad7(-1L)
#> Error: All non-missing values in valid youthvars_gad7 object must be greater than or equal to 0.
youthvars_gad7(22L)
#> Error: All non-missing values in valid youthvars_gad7 object must be less than or equal to 21.

Kessler Psychological Distress Scale (K6) - Australian Scoring System

The youthvars_k6_aus class is defined for integer vectors with a minimum value of 6 and a maximum value of 30.

youthvars_k6_aus(21L)
#> [1] 21
#> attr(,"class")
#> [1] "youthvars_k6_aus" "integer"
youthvars_k6_aus(as.integer(c(6,13,25)))
#> [1]  6 13 25
#> attr(,"class")
#> [1] "youthvars_k6_aus" "integer"

Non-integers and values outside these ranges will produce errors.

youthvars_k6_aus(11.2)
#> Error in make_new_youthvars_k6_aus(x): is.integer(x) is not TRUE
youthvars_k6_aus(1L)
#> Error: All non-missing values in valid youthvars_k6_aus object must be greater than or equal to 6.
youthvars_k6_aus(31L)
#> Error: All non-missing values in valid youthvars_k6_aus object must be less than or equal to 30.

Kessler Psychological Distress Scale (K6) - US Scoring System

The youthvars_k6 class is defined for integer vectors with a minimum value of 0 and a maximum value of 24.

youthvars_k6(21L)
#> [1] 21
#> attr(,"class")
#> [1] "youthvars_k6" "integer"
youthvars_k6(as.integer(c(0,13,24)))
#> [1]  0 13 24
#> attr(,"class")
#> [1] "youthvars_k6" "integer"

Non-integers and values outside these ranges will produce errors.

youthvars_k6(11.2)
#> Error in make_new_youthvars_k6(x): is.integer(x) is not TRUE
youthvars_k6(-1L)
#> Error: All non-missing values in valid youthvars_k6 object must be greater than or equal to 0.
youthvars_k6(25L)
#> Error: All non-missing values in valid youthvars_k6 object must be less than or equal to 24.

Kessler Psychological Distress Scale (K10) - Australian Scoring System

The youthvars_k10_aus class is defined for integer vectors with a minimum value of 10 and a maximum value of 50.

youthvars_k10_aus(21L)
#> [1] 21
#> attr(,"class")
#> [1] "youthvars_k10_aus" "integer"
youthvars_k10_aus(as.integer(c(13,25,41)))
#> [1] 13 25 41
#> attr(,"class")
#> [1] "youthvars_k10_aus" "integer"

Non-integers and values outside these ranges will produce errors.

youthvars_k10_aus(11.2)
#> Error in make_new_youthvars_k10_aus(x): is.integer(x) is not TRUE
youthvars_k10_aus(9L)
#> Error: All non-missing values in valid youthvars_k10_aus object must be greater than or equal to 10.
youthvars_k10_aus(51L)
#> Error: All non-missing values in valid youthvars_k10_aus object must be less than or equal to 50.

Kessler Psychological Distress Scale (K10) - US Scoring System

The youthvars_k10 class is defined for integer vectors with a minimum value of 0 and a maximum value of 40.

youthvars_k10(21L)
#> [1] 21
#> attr(,"class")
#> [1] "youthvars_k10" "integer"
youthvars_k10(as.integer(c(0,13,34)))
#> [1]  0 13 34
#> attr(,"class")
#> [1] "youthvars_k10" "integer"

Non-integers and values outside these ranges will produce errors.

youthvars_k10(11.2)
#> Error in make_new_youthvars_k10(x): is.integer(x) is not TRUE
youthvars_k10(-1L)
#> Error: All non-missing values in valid youthvars_k10 object must be greater than or equal to 0.
youthvars_k10(41L)
#> Error: All non-missing values in valid youthvars_k10 object must be less than or equal to 40.

Overall Anxiety Severity and Impairment Scale (OASIS)

The youthvars_oasis class is defined for integer vectors with a minimum value of 0 and a maximum value of 20.

youthvars_oasis(15L)
#> [1] 15
#> attr(,"class")
#> [1] "youthvars_oasis" "integer"
youthvars_oasis(as.integer(c(0,12,20)))
#> [1]  0 12 20
#> attr(,"class")
#> [1] "youthvars_oasis" "integer"

Non-integers and values outside these ranges will produce errors.

youthvars_oasis(14.2)
#> Error in make_new_youthvars_oasis(x): is.integer(x) is not TRUE
youthvars_oasis(-1L)
#> Error: All non-missing values in valid youthvars_oasis object must be greater than or equal to 0.
youthvars_oasis(21L)
#> Error: All non-missing values in valid youthvars_oasis object must be less than or equal to 20.

Patient Health Questionnaire (PHQ-9)

The youthvars_phq9 class is defined for integer vectors with a minimum value of 0 and a maximum value of 27.

youthvars_phq9(11L)
#> [1] 11
#> attr(,"class")
#> [1] "youthvars_phq9" "integer"
youthvars_phq9(as.integer(c(0,13,27)))
#> [1]  0 13 27
#> attr(,"class")
#> [1] "youthvars_phq9" "integer"

Non-integers and values outside these ranges will produce errors.

youthvars_phq9(15.2)
#> Error in make_new_youthvars_phq9(x): is.integer(x) is not TRUE
youthvars_phq9(-1L)
#> Error: All non-missing values in valid youthvars_phq9 object must be greater than or equal to 0.
youthvars_phq9(28L)
#> Error: All non-missing values in valid youthvars_phq9 object must be less than or equal to 27.

The youthvars_scared class is defined for integer vectors with a minimum value of 0 and a maximum value of 82.

youthvars_scared(77L)
#> [1] 77
#> attr(,"class")
#> [1] "youthvars_scared" "integer"
youthvars_scared(as.integer(c(0,42,82)))
#> [1]  0 42 82
#> attr(,"class")
#> [1] "youthvars_scared" "integer"

Non-integers and values outside these ranges will produce errors.

youthvars_scared(33.2)
#> Error in make_new_youthvars_scared(x): is.integer(x) is not TRUE
youthvars_scared(-1L)
#> Error: All non-missing values in valid youthvars_scared object must be greater than or equal to 0.
youthvars_scared(83)
#> Error in make_new_youthvars_scared(x): is.integer(x) is not TRUE

Social and Occupational Functioning Assessment Scale (SOFAS)

The youthvars_sofas class is defined for integer vectors with a minimum value of 0 and a maximum value of 100.

youthvars_sofas(44L)
#> [1] 44
#> attr(,"class")
#> [1] "youthvars_sofas" "integer"
youthvars_sofas(as.integer(c(0,23,89)))
#> [1]  0 23 89
#> attr(,"class")
#> [1] "youthvars_sofas" "integer"

Non-integers and values outside these ranges will produce errors.

youthvars_sofas(73.2)
#> Error in make_new_youthvars_sofas(x): is.integer(x) is not TRUE
youthvars_sofas(-1L)
#> Error: All non-missing values in valid youthvars_sofas object must be greater than or equal to 0.
youthvars_sofas(103L)
#> Error: All non-missing values in valid youthvars_sofas object must be less than or equal to 100.

3 - Standardise Variable Values With Fuzzy Logic And Correspondence Tables

Costing health economic datasets is an activity that can involve repeated use of lookup tables. This tutorial describes how a module from the costly R package can help you to use a combination of fuzzy logic and correspondence tables to standardise variable values and thus facilitate partial automation of costing algorithms.

This below section renders a vignette article from the costly library. You can use the following links to:

In brief

The steps described and explained in this vignette can also be (more succinctly) accomplished with the following code.

X <- CostlyCountries() 
X <- renew(X, type_1L_chr = "default") 
X <- renew(X, "jw", type_1L_chr = "slot", what_1L_chr = "logic") 
X <- renew(X, T, type_1L_chr = "slot", what_1L_chr = "force")
X <- ratify(X) 

Create project

We begin by creating X, an instance of the CostlyCorrespondences module.

Supply seed dataset

We begin by creating a CostlySeed module instance that includes a dataset containing our variable of interest (in this case, countries). The dataset needs to be paired with a dataset dictionary using the Ready4useDyad module from the ready4use R library. You can supply a custom standards dataset (a tibble), dictionary (a ready4use_dictionary) and the concept represented by our variable of interest using a command of the following format.

# Not run
# A <- CostlySeed(Ready4useDyad_r4 = Ready4useDyad(ds_tb = tibble::tibble(), dictionary_r3 = ready4use_dictionary()), include_chr = c("Country"), label_1L_chr = "Country")

The add_default_country_seed function will perform the previous step using values that pair the world.cities dataset of the maps R library with an appropriate dictionary and specifies countries as the concept we will be standardising.

We can now inspect the first few records from our labelled seed dataset.

renewSlot(A, "Ready4useDyad_r4", type_1L_chr = "label") %>%
exhibitSlot("Ready4useDyad_r4", display_1L_chr = "head", scroll_box_args_ls = list(width = "100%"))
Dataset
City name Country name Population size Latitude coordinate Longitude coordinate Is the nation's capital city
'Abasan al-Jadidah Palestine 5629 31.31 34.34 0
'Abasan al-Kabirah Palestine 18999 31.32 34.35 0
'Abdul Hakim Pakistan 47788 30.55 72.11 0
'Abdullah-as-Salam Kuwait 21817 29.36 47.98 0
'Abud Palestine 2456 32.03 35.07 0
'Abwein Palestine 3434 32.03 35.20 0

We can also inspect the data dictionary contained in A.

exhibitSlot(A, "Ready4useDyad_r4", type_1L_chr = "dict", scroll_box_args_ls = list(width = "100%"))
Data Dictionary
Variable Category Description Class
name City City name character
country.etc Country Country name character
pop Population Population size integer
lat Latitude Latitude coordinate numeric
long Longitude Longitude coordinate numeric
capital Capital Is the nation's capital city integer

We now specify the dictionary category that corresponds to the variable we wish to standardise (“Country”). We need to use the same category name to label the results objects that we generate in subsequent steps.

A@include_chr <- A@label_1L_chr <- "Country"

We now add A to X.

X <- renew(X, A, what_1L_chr = "seed")

Specify standards

We next must specify a dataset that includes the complete list of allowable variable values.

This workflow for this step is similar to that for specifying standards, except that instead of a CostlySeed module we use a CostlyStandards module.

# Not run
# Y <- CostlyStandards(Ready4useDyad_r4 = Ready4useDyad(ds_tb = tibble::tibble(), dictionary_r3 = ready4use_dictionary()))

In many cases using the ISO_3166_1 dataset from the ISOcodes library will be the optimal choice for the standardised form of country names. We can use the add_country_standards function to pair this dataset with its dictionary and create B, a CostlyStandards module instance.

We can inspect the first few cases of the labelled version of the dataset in B.

renewSlot(B, "Ready4useDyad_r4", type_1L_chr = "label") %>% 
  exhibitSlot("Ready4useDyad_r4", display_1L_chr = "head", scroll_box_args_ls = list(width = "100%"))
Dataset
Alpabetical country code (two letters) Alpabetical country code (three letters) Numeric country code Country name Country name (official) Country name (common alternative)
AW ABW 533 Aruba NA NA
AF AFG 004 Afghanistan Islamic Republic of Afghanistan NA
AO AGO 024 Angola Republic of Angola NA
AI AIA 660 Anguilla NA NA
AX ALA 248 Åland Islands NA NA
AL ALB 008 Albania Republic of Albania NA

We can also inspect the data dictionary contained in B.

exhibitSlot(B, "Ready4useDyad_r4", type_1L_chr = "dict", scroll_box_args_ls = list(width = "100%"))
Data Dictionary
Variable Category Description Class
Alpha_2 A2 Alpabetical country code (two letters) character
Alpha_3 A3 Alpabetical country code (three letters) character
Numeric N Numeric country code character
Name Country Country name character
Official_name Official Country name (official) character
Common_name Common Country name (common alternative) character

We can now specifying both the concept (from the “Category” column of the data dictionary) that specifies allowable values for our target variable and all concepts we plan to use for fuzzy logic matching (described below).

B@label_1L_chr <- "Country"
B@include_chr <- c("Country", "Official","Common","A3","A2")

We now add B to X.

X <- renew(X, B, what_1L_chr = "standards")

Compare variable of interest values from seed and standards dataset.

To identify any disparities between the variable of interest in our seed and standards datasets we can use the ratify method. Supplying the value “identity” ensures that the output will differ from input only in the slot reserved for results.

X <- ratify(X, new_val_xx = "identity")

We can now identify the values from our seed dataset variable of interest that were not in our standard values.

X@results_ls$Country_Output_Validation$Invalid_Values

We can also identify standard values that were not present in the seed dataset variable of interest.

X@results_ls$Country_Output_Validation$Absent_Values
#>  [1] "Åland Islands"                                "Antarctica"                                   "Bolivia, Plurinational State of"              "Bonaire, Sint Eustatius and Saba"            
#>  [5] "Bouvet Island"                                "British Indian Ocean Territory"               "Brunei Darussalam"                            "Cabo Verde"                                  
#>  [9] "Christmas Island"                             "Cocos (Keeling) Islands"                      "Congo, The Democratic Republic of the"        "Côte d'Ivoire"                               
#> [13] "Curaçao"                                      "Czechia"                                      "Eswatini"                                     "Falkland Islands (Malvinas)"                 
#> [17] "French Southern Territories"                  "Guernsey"                                     "Heard Island and McDonald Islands"            "Holy See (Vatican City State)"               
#> [21] "Hong Kong"                                    "Iran, Islamic Republic of"                    "Korea, Democratic People's Republic of"       "Korea, Republic of"                          
#> [25] "Lao People's Democratic Republic"             "Macao"                                        "Micronesia, Federated States of"              "Moldova, Republic of"                        
#> [29] "Palestine, State of"                          "Réunion"                                      "Russian Federation"                           "Saint Barthélemy"                            
#> [33] "Saint Helena, Ascension and Tristan da Cunha" "Saint Martin (French part)"                   "Saint Vincent and the Grenadines"             "Sint Maarten (Dutch part)"                   
#> [37] "South Georgia and the South Sandwich Islands" "Syrian Arab Republic"                         "Taiwan, Province of China"                    "Tanzania, United Republic of"                
#> [41] "Timor-Leste"                                  "Turks and Caicos Islands"                     "United Kingdom"                               "United States"                               
#> [45] "United States Minor Outlying Islands"         "Venezuela, Bolivarian Republic of"            "Viet Nam"                                     "Virgin Islands, British"                     
#> [49] "Virgin Islands, U.S."

Standardise variable values

We can explore the extent to which we can use fuzzy logic to reconcile some of these discrepancies. To identify the types of fuzzy logic algorithms we could use, run the following command to explore the relevant part of the documentation from the stringdist library.

# Not run
# help("stringdist-metrics", package=stringdist)

In this case, we have chosen the Jaro, or Jaro-Winkler distance method (“jw”).

X <- renew(X, "jw", type_1L_chr = "slot", what_1L_chr = "logic") 
X <- ratify(X, new_val_xx = NULL)

This method will replace every previously invalid seed dataset variable value with the best available match identified by the selected fuzzy logic algorithm.

X@results_ls$Country_Output_Validation$Invalid_Values
#> character(0)

However, some of the replacements will be spurious as can be seen by inspecting the record of the replacements made.

X@results_ls$Country_Output_Correspondences
#> # A tibble: 41 × 2
#>    old_nms_chr               new_nms_chr                          
#>    <chr>                     <chr>                                
#>  1 Azores                    Timor-Leste                          
#>  2 Bolivia                   Bolivia, Plurinational State of      
#>  3 British Virgin Islands    Virgin Islands, British              
#>  4 Brunei                    Brunei Darussalam                    
#>  5 Canary Islands            Åland Islands                        
#>  6 Cape Verde                Cabo Verde                           
#>  7 Congo Democratic Republic Congo, The Democratic Republic of the
#>  8 Czech Republic            Czechia                              
#>  9 East Timor                Eswatini                             
#> 10 Easter Island             Christmas Island                     
#> # ℹ 31 more rows

For each of the incorrect correspondences, we will need to manually specify correct values. We can do this using the ready4show_correspondences sub-module.

# Not run
# a <- ready4show::renew.ready4show_correspondences(ready4show::ready4show_correspondences(), 
#         old_nms_chr = c("old_name_1", "old_name_2", "etc...."), new_nms_chr = c("new_name_1", "new_name_2", "etc...."))

The make_country_correspondences can be used as a shortcut for creating the alternative correspondences for this specific example.

We can inspect the values of this correspondence table.

exhibit(a, scroll_box_args_ls = list(width = "100%"))
Old name New name
Azores Portugal
Canary Islands Spain
Easter Island Chile
East Timor Timor-Leste
Ivory Coast Côte d'Ivoire
Kosovo Kosovo
Madeira Portugal
Netherlands Antilles Bonaire, Sint Eustatius and Saba
Sicily Italy
Vatican City Holy See (Vatican City State)

When the ratify method was used to apply the fuzzy logic algorithm in a previous step, X was modified so that this logic is by default switched off for future calls to ratify. If we had created a new correspondence table that specified replacements for all invalid values, this would not be a problem. However, in this example we are only specifying correspondences where the fuzzy logic algorithm failed, so we need to again supply our desired fuzzy logic value.

X <- renew(X, "jw", type_1L_chr = "slot", what_1L_chr = "logic") 

We now rerun our ratify method (which in this example will combine fuzzy logic with lookups from the manually created correspondences table).

X <- ratify(X, new_val_xx = a)

We once again inspect results.

Our correspondences table looks better.

X@results_ls$Country_Output_Correspondences
#> # A tibble: 41 × 2
#>    old_nms_chr               new_nms_chr                          
#>    <chr>                     <chr>                                
#>  1 Azores                    Portugal                             
#>  2 Bolivia                   Bolivia, Plurinational State of      
#>  3 British Virgin Islands    Virgin Islands, British              
#>  4 Brunei                    Brunei Darussalam                    
#>  5 Canary Islands            Spain                                
#>  6 Cape Verde                Cabo Verde                           
#>  7 Congo Democratic Republic Congo, The Democratic Republic of the
#>  8 Czech Republic            Czechia                              
#>  9 East Timor                Timor-Leste                          
#> 10 Easter Island             Chile                                
#> # ℹ 31 more rows

There is still a value that is not included in our standards.

X@results_ls$Country_Output_Validation$Invalid_Values
#> [1] "Kosovo"

We can rerun the ratify method to force the removal of any record that is not included in our standards dataset.

X <- renew(X, T, type_1L_chr = "slot", what_1L_chr = "force") 
X <- ratify(X, new_val_xx = "identity")

No invalid values remain.

X@results_ls$Country_Output_Validation$Invalid_Values
#> character(0)

However, there are also a some values from our standards dataset that are not represented in the results dataset values.

X@results_ls$Country_Output_Validation$Absent_Values
#>  [1] "Åland Islands"                                "Antarctica"                                   "Bouvet Island"                                "British Indian Ocean Territory"              
#>  [5] "Christmas Island"                             "Cocos (Keeling) Islands"                      "Curaçao"                                      "French Southern Territories"                 
#>  [9] "Heard Island and McDonald Islands"            "Hong Kong"                                    "Macao"                                        "Sint Maarten (Dutch part)"                   
#> [13] "South Georgia and the South Sandwich Islands" "United States Minor Outlying Islands"

Whether this is a problem or not depends on the intended purposes of the standardised dataset we are creating. We could choose to rerun the previous steps after making edits to either or both of the standards dataset (e.g. we could delete any superfluous, outdated or incorrect records or use an entirely new standards dataset) and seed dataset (e.g. adding new records or recategorising existing records so that there are corresponding values for every missing standard value). In this case we are going to assume that the above missing values are not a cause for concern for the valid use of our updated dataset for it intended purposes. We can now create a new object Y, using our results dataset’s Ready4useDyad module instance.

Y <- X@results_ls$Country_Output_Lookup

We can inspect the records for cases corresponding to capital cities from our new dataset.

renewSlot(Y,"ds_tb",Y@ds_tb %>% dplyr::filter(capital==1)) %>%
  renew(type_1L_chr = "label") %>%
  exhibit(scroll_box_args_ls = list(width = "100%"))
Dataset
City name Country name Population size Latitude coordinate Longitude coordinate Is the nation's capital city
'Amman Jordan 1303197 31.95 35.93 1
Abu Dhabi United Arab Emirates 619316 24.48 54.37 1
Abuja Nigeria 178462 9.18 7.17 1
Accra Ghana 2029143 5.56 -0.20 1
Adamstown Pitcairn 51 -25.05 -130.10 1
Addis Abeba Ethiopia 2823167 9.03 38.74 1
Agana Guam 1041 13.47 144.75 1
Algiers Algeria 2029936 36.77 3.04 1
Alofi Niue 627 -19.05 -169.92 1
Amsterdam Netherlands 744159 52.37 4.89 1
Andorra la Vella Andorra 20314 42.51 1.51 1
Ankara Turkey 3579706 39.93 32.85 1
Antananarivo Madagascar 1463754 -18.89 47.51 1
Apia Samoa 40805 -13.83 -171.76 1
Asgabat Turkmenistan 823013 37.95 58.38 1
Asmara Eritrea 578860 15.33 38.94 1
Astana Kazakhstan 351343 51.17 71.47 1
Asuncion Paraguay 507574 -25.30 -57.63 1
Athens Greece 725049 37.98 23.73 1
Avarua Cook Islands 13645 -21.20 -159.76 1
Baghdad Iraq 5753612 33.33 44.44 1
Bairiki Kiribati 45982 1.33 172.99 1
Baku Azerbaijan 1118725 40.39 49.86 1
Bamako Mali 1342519 12.65 -7.99 1
Bandar Seri Begawan Brunei Darussalam 67077 4.93 114.95 1
Bangkok Thailand 4935988 13.73 100.50 1
Bangui Central African Republic 547668 4.36 18.56 1
Banjul Gambia 34388 13.46 -16.60 1
Basse-Terre Guadeloupe 11298 16.00 -61.72 1
Basseterre Saint Kitts and Nevis 12883 17.31 -62.73 1
Bayrut Lebanon 1273440 33.88 35.50 1
Beijing China 7602069 39.93 116.40 1
Belgrade Serbia 1113589 44.83 20.50 1
Belmopan Belize 14590 17.25 -88.79 1
Berlin Germany 3378275 52.52 13.38 1
Bern Switzerland 120596 46.95 7.44 1
Biskek Kyrgyzstan 915625 42.87 74.57 1
Bissau Guinea-Bissau 404119 11.87 -15.60 1
Bogota Colombia 7235084 4.63 -74.09 1
Brasilia Brazil 2260541 -15.78 -47.91 1
Bratislava Slovakia 422452 48.16 17.13 1
Brazzaville Congo 1326975 -4.25 15.26 1
Bridgetown Barbados 98725 13.11 -59.61 1
Brussels Belgium 1031925 50.83 4.33 1
Bucharest Romania 1862930 44.44 26.10 1
Budapest Hungary 1700019 47.51 19.08 1
Buenos Aires Argentina 11595183 -34.61 -58.37 1
Bujumbura Burundi 336561 -3.37 29.35 1
Cairo Egypt 7836243 30.06 31.25 1
Canberra Australia 324736 -35.31 149.13 1
Caracas Venezuela, Bolivarian Republic of 1808937 10.54 -66.93 1
Castries Saint Lucia 12904 14.03 -60.98 1
Cayenne French Guiana 62926 4.92 -52.34 1
Charlotte Amalie Virgin Islands, U.S. 10415 18.35 -64.94 1
Chisinau Moldova, Republic of 623671 47.03 28.83 1
Cockburn Town Turks and Caicos Islands 174 21.46 -71.14 1
Colombo Sri Lanka 649496 6.93 79.85 1
Conakry Guinea 1970382 9.55 -13.67 1
Copenhagen Denmark 1091978 55.68 12.57 1
Dakar Senegal 2406598 14.72 -17.48 1
Damascus Syrian Arab Republic 1580909 33.50 36.32 1
Dhaka Bangladesh 6724976 23.70 90.39 1
Dili Timor-Leste 163305 -8.57 125.58 1
Dodoma Tanzania, United Republic of 188150 -6.17 35.74 1
Doha Qatar 351381 25.30 51.51 1
Douglas Isle of Man 25621 54.15 -4.48 1
Dublin Ireland 1030431 53.33 -6.25 1
Dushanbe Tajikistan 538456 38.57 68.78 1
Dzaoudzi Mayotte 14558 -12.77 45.25 1
Fakaofo Tokelau 267 -9.38 -171.22 1
Fort-de-France Martinique 89233 14.60 -61.08 1
Freetown Sierra Leone 818709 8.49 -13.24 1
Gaborone Botswana 214412 -24.65 25.91 1
George Town Cayman Islands 30570 19.28 -81.39 1
Georgetown Guyana 236878 6.79 -58.16 1
Gibraltar Gibraltar 26404 36.14 -5.35 1
Guatemala Guatemala 1010253 14.63 -90.55 1
Ha Noi Viet Nam 1452055 21.03 105.84 1
Hamilton Bermuda 889 32.30 -64.79 1
Harare Zimbabwe 1575127 -17.82 31.05 1
Havanna Cuba 2163132 23.13 -82.39 1
Helsinki Finland 558341 60.17 24.94 1
Honiara Solomon Islands 57410 -9.43 159.91 1
Islamabad Pakistan 794431 33.72 73.06 1
Jakarta Indonesia 8556798 -6.18 106.83 1
Jamestown Saint Helena, Ascension and Tristan da Cunha 603 -15.92 -5.71 1
Jerusalem Israel 731731 31.78 35.22 1
Jibuti Djibouti 633884 11.56 43.15 1
Kabul Afghanistan 3120963 34.53 69.17 1
Kampala Uganda 1403619 0.32 32.58 1
Kathmandu Nepal 822930 27.71 85.31 1
Khartoum Sudan 2090001 15.58 32.52 1
Kiev Ukraine 2491404 50.43 30.52 1
Kigali Rwanda 800003 -1.94 30.06 1
Kingston Jamaica 585300 17.99 -76.80 1
Kingston Norfolk Island 890 -29.03 168.05 1
Kingstown Saint Vincent and the Grenadines 18160 13.16 -61.23 1
Kinshasa Congo, The Democratic Republic of the 8096254 -4.31 15.32 1
Koror Palau 11458 7.35 134.51 1
Kuala Lumpur Malaysia 1482359 3.16 101.71 1
Libreville Gabon 591356 0.39 9.45 1
Lilongwe Malawi 683477 -13.97 33.80 1
Lima Peru 7857121 -12.07 -77.05 1
Lisbon Portugal 508209 38.72 -9.14 1
Ljubljana Slovenia 254188 46.06 14.51 1
Lome Togo 737751 6.17 1.35 1
London United Kingdom 7489022 51.52 -0.10 1
Longyearbyen Svalbard and Jan Mayen 1263 78.21 15.61 1
Luanda Angola 2875277 -8.82 13.24 1
Lusaka Zambia 1306577 -15.42 28.29 1
Luxemburg Luxembourg 76380 49.62 6.12 1
Madrid Spain 3146804 40.42 -3.71 1
Malabo Equatorial Guinea 161409 3.74 8.79 1
Male Maldives 87154 4.17 73.50 1
Managua Nicaragua 990417 12.15 -86.27 1
Manama Bahrain 147894 26.21 50.58 1
Manila Philippines 10546511 14.62 120.97 1
Maputo Mozambique 1220167 -25.95 32.57 1
Maseru Lesotho 116268 -29.31 27.49 1
Mata'utu Wallis and Futuna 1310 -13.28 -176.13 1
Mbabane Eswatini 78740 -26.32 31.14 1
Mexico City Mexico 8659409 19.43 -99.14 1
Minsk Belarus 1747482 53.91 27.55 1
Mogadishu Somalia 2723378 2.05 45.33 1
Monaco-Ville Monaco 975 43.74 7.42 1
Monrovia Liberia 954458 6.31 -10.80 1
Montevideo Uruguay 1271664 -34.87 -56.17 1
Moroni Comoros 43704 -11.74 43.23 1
Moscow Russian Federation 10472629 55.75 37.62 1
Muscat Oman 24122 23.61 58.54 1
N'Djamena Chad 737281 12.11 15.05 1
Nairobi Kenya 2864667 -1.29 36.82 1
Nassau Bahamas 231519 25.06 -77.33 1
Ni Dilli India 321883 28.60 77.22 1
Niamey Niger 801297 13.52 2.12 1
Nicosia Cyprus 202488 35.16 33.38 1
Nicosia Cyprus 42372 35.18 33.37 1
Nouakchott Mauritania 731242 18.09 -15.98 1
Noumea New Caledonia 94751 -22.27 166.44 1
Nuku'alofa Tonga 23733 -21.14 -175.22 1
Nuuk Greenland 15243 64.18 -51.73 1
Oranjestad Aruba 30710 12.53 -70.03 1
Oslo Norway 821445 59.91 10.75 1
Ottawa Canada 885542 45.42 -75.71 1
Ouagadougou Burkina Faso 1119775 12.37 -1.53 1
Pago Pago American Samoa 4180 -14.24 -170.72 1
Palikir Micronesia, Federated States of 4552 6.92 158.16 1
Panama Panama 406070 8.97 -79.53 1
Papeete French Polynesia 26400 -17.52 -149.56 1
Paramaribo Suriname 224925 5.85 -55.20 1
Paris France 2141839 48.86 2.34 1
Phnum Penh Cambodia 1673131 11.57 104.92 1
Port Louis Mauritius 156760 -20.17 57.51 1
Port Moresby Papua New Guinea 289861 -9.48 147.18 1
Port Stanley Falkland Islands (Malvinas) 2269 -51.70 -57.82 1
Port of Spain Trinidad and Tobago 49764 10.66 -61.51 1
Port-au-Prince Haiti 1277104 18.54 -72.34 1
Porto Novo Benin 238199 6.48 2.63 1
Prague Czechia 1168374 50.08 14.43 1
Praia Cabo Verde 117342 14.93 -23.54 1
Pretoria South Africa 1687779 -25.73 28.22 1
Pyongyang Korea, Democratic People's Republic of 2992272 39.02 125.75 1
Quito Ecuador 1399814 -0.19 -78.50 1
Rabat Morocco 1688738 34.02 -6.84 1
Rangoon Myanmar 4572948 16.79 96.15 1
Reykjavik Iceland 114576 64.14 -21.92 1
Riga Latvia 738386 56.97 24.13 1
Rita Marshall Islands 21270 7.12 171.06 1
Riyadh Saudi Arabia 4328067 24.65 46.77 1
Road Town Virgin Islands, British 8613 18.43 -64.63 1
Rome Italy 2561181 41.89 12.50 1
Roseau Dominica 16577 15.30 -61.39 1
Saint George's Grenada 4315 12.06 -61.74 1
Saint Helier Jersey 28910 49.19 -2.11 1
Saint John's Antigua and Barbuda 25321 17.11 -61.85 1
Saint Peter Port Guernsey 16702 49.47 -2.55 1
Saint-Denis Réunion 137787 -20.87 55.46 1
Saint-Pierre Saint Pierre and Miquelon 6254 46.79 -56.18 1
San Jose Costa Rica 32187 10.97 -85.13 1
San Jose Costa Rica 339588 9.93 -84.08 1
San Juan Puerto Rico 417154 18.44 -66.13 1
San Marino San Marino 4624 43.94 12.43 1
San Salvador El Salvador 534409 13.69 -89.19 1
San'a Yemen 1921589 15.38 44.21 1
Santiago Chile 4893495 -33.46 -70.64 1
Santo Domingo Dominican Republic 2253437 18.48 -69.91 1
Sao Tome Sao Tome and Principe 63772 0.37 6.73 1
Sarajevo Bosnia and Herzegovina 737350 43.85 18.38 1
Singapore Singapore 3601745 1.30 103.85 1
Skopje North Macedonia 477493 42.00 21.47 1
Sofia Bulgaria 1166143 42.69 23.31 1
Seoul Korea, Republic of 10409345 37.56 126.99 1
Stockholm Sweden 1260712 59.33 18.07 1
Sucre Bolivia, Plurinational State of 232669 -19.06 -65.26 1
Susupe Northern Mariana Islands 2402 15.14 145.70 1
Taipei Taiwan, Province of China 2491662 25.02 121.45 1
Tallinn Estonia 392386 59.44 24.74 1
Tashkent Uzbekistan 1967879 41.31 69.30 1
Tbilisi Georgia 1038343 41.72 44.79 1
Tegucigalpa Honduras 872403 14.09 -87.22 1
Tehran Iran, Islamic Republic of 7160094 35.67 51.43 1
The Valley Anguilla 1435 18.22 -63.05 1
Thimphu Bhutan 74175 27.48 89.70 1
Tirana Albania 380403 41.33 19.82 1
Tokyo Japan 8372440 35.67 139.77 1
Torshavn Faroe Islands 13313 62.03 -6.80 1
Tripoli Libya 1164634 32.87 13.18 1
Tunis Tunisia 693294 36.84 10.22 1
Ulaanbaatar Mongolia 862842 47.93 106.91 1
Vaduz Liechtenstein 5248 47.14 9.53 1
Vaiaku Tuvalu 4835 -8.52 179.20 1
Valletta Malta 6748 35.91 14.52 1
Vatican City Holy See (Vatican City State) 767 41.90 12.46 1
Victoria Seychelles 22611 -4.62 55.45 1
Vienna Austria 1570976 48.22 16.37 1
Vientiane Lao People's Democratic Republic 199863 17.97 102.61 1
Vila Vanuatu 37141 -17.74 168.31 1
Vilnius Lithuania 542014 54.70 25.27 1
Warsaw Poland 1634441 52.26 21.02 1
Washington United States 548359 38.91 -77.02 1
Wellington New Zealand 182254 -41.28 174.78 1
Willemstad Bonaire, Sint Eustatius and Saba 98339 12.10 -68.93 1
Windhoek Namibia 277349 -22.56 17.09 1
Yamoussoukro Côte d'Ivoire 200103 6.82 -5.28 1
Yaounde Cameroon 1344617 3.87 11.52 1
Yaren Nauru 4587 -0.55 166.91 1
Yerevan Armenia 1090537 40.17 44.52 1
Zagreb Croatia 700717 45.80 15.97 1
al-'Ayun Western Sahara 188084 27.16 -13.20 1
al-Kuwayt Kuwait 63596 29.38 47.99 1

4 - Standardise Variable Values With Lookup Codes

This tutorial describes how a module from the costly R package can help you to use lookup codes to standardise variable values and thus facilitate partial automation of costing algorithms.

This below section renders a vignette article from the costly library. You can use the following links to:

Note. Parts of the workflow described in this article are common to steps explained in more detail in the article outlining the workflow using fuzzy logic and correspondence tables.

In brief

The steps described and explained in this vignette can also be (more succinctly) accomplished with the following code.

X <- CostlyCountries()
X <- renew(X,
           new_val_xx = add_default_currency_seed(X@CostlySeed_r4, include_1L_chr = "Country"), 
           what_1L_chr = "seed")
X <- renew(X, "jw", type_1L_chr = "slot", what_1L_chr = "logic") 
X <- renew(X, new_val_xx = make_country_correspondences("currencies"), what_1L_chr = "correspondences") 
X <- renew(X, T, type_1L_chr = "slot", what_1L_chr = "force") 
X <- ratify(X)
Y <- CostlyCurrencies()
Y <- renew(Y, new_val_xx = add_default_currency_seed(Y@CostlySeed_r4,
                                                     Ready4useDyad_r4 = X@results_ls$Country_Output_Lookup), 
           what_1L_chr = "seed")
Y <- ratify(Y, type_1L_chr = "Lookup")
Y <- renew(Y, T, type_1L_chr = "slot", what_1L_chr = "force") 
Y <- ratify(Y, type_1L_chr = "Lookup")

Create project

We begin by creating X, a CostlyCorrespondences module instance.

Supply seed dataset

We next create a CostlySeed module instance that includes a dataset containing our variable of interest (in this case, countries). The dataset needs to be paired with a dataset dictionary using the Ready4useDyad module from the ready4use R library. You can supply a custom standards dataset (a tibble), dictionary (a ready4use_dictionary) and the concept represented by our variable of interest using a command of the following format.

# Not run
# A <- CostlySeed(Ready4useDyad_r4 = Ready4useDyad(ds_tb = tibble::tibble(), dictionary_r3 = ready4use_dictionary()), include_chr = c("Country"), label_1L_chr = "Country")

The add_default_country_seed function will perform the previous step using values that pair the world.cities dataset of the maps R library with an appropriate dictionary and specifies countries as the concept we will be standardising.

We now add A to a new CostlyCorrespondences module instance Y, which we use to standardise the country concept variable using a fuzzy logic And correspondence tables workflow.

A@include_chr <- A@label_1L_chr <- "Country"
Y <- CostlyCountries(CostlySeed_r4 = A) %>%
  renew("jw", type_1L_chr = "slot", what_1L_chr = "logic") %>%
  renew(new_val_xx = make_country_correspondences("currencies"), what_1L_chr = "correspondences") %>%
  renew(T, type_1L_chr = "slot", what_1L_chr = "force") %>%
  ratify()

We now update X with the results Ready4useDyad from Y (a seed dataset for which country names have been standardised).

X <- renew(X, new_val_xx = CostlySeed(Ready4useDyad_r4 = Y@results_ls$Country_Output_Lookup), what_1L_chr = "seed") # 

We can now inspect the first few records from our labelled seed dataset.

renewSlot(X, "CostlySeed_r4@Ready4useDyad_r4", type_1L_chr = "label") %>%
exhibitSlot("CostlySeed_r4@Ready4useDyad_r4", display_1L_chr = "head", scroll_box_args_ls = list(width = "100%"))
Dataset
Country name Currency name Currency symbol Currency alphabetical ISO code (three letter) Currency's fractional unit Number of fractional units in basic unit
Afghanistan Afghan afghani ؋‎ AFN Pul 100
Albania Albanian lek Lek ALL Qintar 100
Algeria Algerian dinar DA DZD Centime 100
Andorra Euro EUR Cent 100
Angola Angolan kwanza Kz AOA Cêntimo 100
Anguilla Eastern Caribbean dollar \$ XCD Cent 100

We can also inspect the seed dataset’s dictionary.

exhibitSlot(X, "CostlySeed_r4@Ready4useDyad_r4", type_1L_chr = "dict", scroll_box_args_ls = list(width = "100%"))
Data Dictionary
Variable Category Description Class
State / Territory\[1\] Country Country name character
Currency\[1\]\[2\] Currency Currency name character
Symbol\[D\] orAbbrev.\[3\] Symbol Currency symbol character
ISO code\[2\] A3 Currency alphabetical ISO code (three letter) character
Fractionalunit Fractional Currency's fractional unit character
Numberto basic Number Number of fractional units in basic unit character

We specify the seed dataset concept that we are looking to standardise and the concept that we will use to lookup replacement values from the standards dataset.

X@CostlySeed_r4@label_1L_chr <- "Currency"
X@CostlySeed_r4@match_1L_chr <- "A3"

Specify standards

We can now create B, a CostlyStandards module instance that includes a dataset specifying the complete list of allowable variable values. In many cases using the ISO_4217 dataset from the ISOcodes library will be the optimal source of standardised names for currencies. Using the add_currency_standards function will pair this dataset with a dictionary.

We can inspect the first few cases of the labelled version of the standards dataset in B.

renewSlot(B, "Ready4useDyad_r4", type_1L_chr = "label") %>% 
  exhibitSlot("Ready4useDyad_r4", display_1L_chr = "head", scroll_box_args_ls = list(width = "100%"))
Dataset
Alpabetical currency code (three letters) Numeric currency code Currency name
AED 784 UAE Dirham
AFN 971 Afghani
ALL 008 Lek
AMD 051 Armenian Dram
ANG 532 Netherlands Antillean Guilder
AOA 973 Kwanza

We can also inspect the data dictionary contained in B.

exhibitSlot(B, "Ready4useDyad_r4", type_1L_chr = "dict", scroll_box_args_ls = list(width = "100%"))
Data Dictionary
Variable Category Description Class
Letter A3 Alpabetical currency code (three letters) character
Numeric N Numeric currency code character
Currency Currency Currency name character

We can now specifying both the concept (“Currency”) that specifies allowable values for our target variable and the concepts we plan to use for lookup matching (described below).

#B@include_chr <- c("Currency", "Letter")
B@label_1L_chr <- "Currency"
B@match_1L_chr <- "A3"

We now add B to X.

X <- renew(X, B, what_1L_chr = "standards")

Compare variable of interest values from seed and standards dataset.

Currently, the majority of our currency names need to be standardised. In many cases this may be due to something as simple as the use of lower case.

X <- ratify(X, new_val_xx = "identity")
X@results_ls$Currency_Output_Validation$Invalid_Values
#>   [1] "Afghan afghani"                          "Albanian lek"                            "Algerian dinar"                          "Angolan kwanza"                         
#>   [5] "Argentine peso"                          "Armenian dram"                           "Aruban florin"                           "Australian dollar"                      
#>   [9] "Azerbaijani manat"                       "Bahamian dollar"                         "Bahraini dinar"                          "Bangladeshi taka"                       
#>  [13] "Barbadian dollar"                        "Belarusian ruble"                        "Belize dollar"                           "Bermudian dollar"                       
#>  [17] "Bhutanese ngultrum"                      "Bitcoin[4] (as legal tender)"            "Bolivian boliviano"                      "Bosnia and Herzegovina convertible mark"
#>  [21] "Botswana pula"                           "Brazilian real"                          "Brunei dollar"                           "Bulgarian lev"                          
#>  [25] "Burmese kyat"                            "Burundian franc"                         "Cambodian riel"                          "Canadian dollar"                        
#>  [29] "Cape Verdean escudo"                     "Cayman Islands dollar"                   "Central African CFA franc"               "CFP franc"                              
#>  [33] "Chilean peso"                            "Colombian peso"                          "Comorian franc"                          "Congolese franc"                        
#>  [37] "Cook Islands dollar"                     "Costa Rican colón"                       "Cuban peso"                              "Czech koruna"                           
#>  [41] "Danish krone"                            "Djiboutian franc"                        "Dominican peso"                          "Eastern Caribbean dollar"               
#>  [45] "Egyptian pound"                          "Eritrean nakfa"                          "Ethiopian birr"                          "Falkland Islands pound"                 
#>  [49] "Faroese króna"                           "Fijian dollar"                           "Gambian dalasi"                          "Georgian lari"                          
#>  [53] "Ghanaian cedi"                           "Gibraltar pound"                         "Guatemalan quetzal"                      "Guernsey pound"                         
#>  [57] "Guinean franc"                           "Guyanese dollar"                         "Haitian gourde"                          "Honduran lempira"                       
#>  [61] "Hong Kong dollar"                        "Hungarian forint"                        "Icelandic króna"                         "Indian rupee"                           
#>  [65] "Indonesian rupiah"                       "Iranian rial"                            "Iraqi dinar"                             "Israeli new shekel"                     
#>  [69] "Jamaican dollar"                         "Japanese yen"                            "Jersey pound"                            "Jordanian dinar"                        
#>  [73] "Kazakhstani tenge"                       "Kenyan shilling"                         "Kiribati dollar[E]"                      "Kuwaiti dinar"                          
#>  [77] "Kyrgyz som"                              "Lao kip"                                 "Lebanese pound"                          "Lesotho loti"                           
#>  [81] "Liberian dollar"                         "Libyan dinar"                            "Macanese pataca"                         "Macedonian denar"                       
#>  [85] "Malagasy ariary"                         "Malawian kwacha"                         "Malaysian ringgit"                       "Maldivian rufiyaa"                      
#>  [89] "Manx pound"                              "Mauritanian ouguiya"                     "Mauritian rupee"                         "Mexican peso"                           
#>  [93] "Moldovan leu"                            "Mongolian tögrög"                        "Moroccan dirham"                         "Mozambican metical"                     
#>  [97] "Namibian dollar"                         "Nepalese rupee"                          "Netherlands Antillean guilder"           "New Taiwan dollar"                      
#> [101] "New Zealand dollar"                      "Nicaraguan córdoba"                      "Nigerian naira"                          "Niue dollar[E]"                         
#> [105] "North Korean won"                        "Norwegian krone"                         "Omani rial"                              "Pakistani rupee"                        
#> [109] "Panamanian balboa"                       "Papua New Guinean kina"                  "Paraguayan guaraní"                      "Peruvian sol"                           
#> [113] "Philippine peso"                         "Pitcairn Islands dollar[E]"              "Polish złoty"                            "Qatari riyal"                           
#> [117] "Renminbi"                                "Romanian leu"                            "Russian ruble"                           "Rwandan franc"                          
#> [121] "Sahrawi peseta"                          "Saint Helena pound"                      "Samoan tālā"                             "São Tomé and Príncipe dobra"            
#> [125] "Saudi riyal"                             "Serbian dinar"                           "Seychellois rupee"                       "Sierra Leonean leone"                   
#> [129] "Singapore dollar"                        "Solomon Islands dollar"                  "Somali shilling"                         "South African rand"                     
#> [133] "South Korean won"                        "South Sudanese pound"                    "Sri Lankan rupee"                        "Sterling"                               
#> [137] "Sudanese pound"                          "Surinamese dollar"                       "Swazi lilangeni"                         "Swedish krona"                          
#> [141] "Swiss franc"                             "Syrian pound"                            "Tajikistani somoni"                      "Tanzanian shilling"                     
#> [145] "Thai baht"                               "Tongan paʻanga[K]"                       "Trinidad and Tobago dollar"              "Tunisian dinar"                         
#> [149] "Turkish lira"                            "Turkmenistani manat"                     "Tuvaluan dollar"                         "Ugandan shilling"                       
#> [153] "Ukrainian hryvnia"                       "United Arab Emirates dirham"             "United States dollar"                    "United States dollar[F]"                
#> [157] "Uruguayan peso"                          "Uzbekistani sum"                         "Vanuatu vatu"                            "Venezuelan digital bolívar"             
#> [161] "Venezuelan sovereign bolívar"            "Vietnamese đồng"                         "West African CFA franc"                  "Yemeni rial"                            
#> [165] "Zambian kwacha"                          "Zimbabwean dollar"

Standardised currency names not currently present in our seed dataset are as follows.

X@results_ls$Currency_Output_Validation$Absent_Values
#>   [1] "ADB Unit of Account"                                               "Afghani"                                                          
#>   [3] "Algerian Dinar"                                                    "Argentine Peso"                                                   
#>   [5] "Armenian Dram"                                                     "Aruban Florin"                                                    
#>   [7] "Australian Dollar"                                                 "Azerbaijan Manat"                                                 
#>   [9] "Bahamian Dollar"                                                   "Bahraini Dinar"                                                   
#>  [11] "Baht"                                                              "Balboa"                                                           
#>  [13] "Barbados Dollar"                                                   "Belarusian Ruble"                                                 
#>  [15] "Belize Dollar"                                                     "Bermudian Dollar"                                                 
#>  [17] "Bolívar Soberano"                                                  "Boliviano"                                                        
#>  [19] "Bond Markets Unit European Composite Unit (EURCO)"                 "Bond Markets Unit European Monetary Unit (E.M.U.-6)"              
#>  [21] "Bond Markets Unit European Unit of Account 17 (E.U.A.-17)"         "Bond Markets Unit European Unit of Account 9 (E.U.A.-9)"          
#>  [23] "Brazilian Real"                                                    "Brunei Dollar"                                                    
#>  [25] "Bulgarian Lev"                                                     "Burundi Franc"                                                    
#>  [27] "Cabo Verde Escudo"                                                 "Cayman Islands Dollar"                                            
#>  [29] "CFA Franc BCEAO"                                                   "CFA Franc BEAC"                                                   
#>  [31] "CFP Franc"                                                         "Chilean Peso"                                                     
#>  [33] "Codes specifically reserved for testing purposes"                  "Colombian Peso"                                                   
#>  [35] "Comorian Franc"                                                    "Congolese Franc"                                                  
#>  [37] "Convertible Mark"                                                  "Cordoba Oro"                                                      
#>  [39] "Costa Rican Colon"                                                 "Cuban Peso"                                                       
#>  [41] "Czech Koruna"                                                      "Dalasi"                                                           
#>  [43] "Danish Krone"                                                      "Denar"                                                            
#>  [45] "Djibouti Franc"                                                    "Dobra"                                                            
#>  [47] "Dominican Peso"                                                    "Dong"                                                             
#>  [49] "East Caribbean Dollar"                                             "Egyptian Pound"                                                   
#>  [51] "El Salvador Colon"                                                 "Ethiopian Birr"                                                   
#>  [53] "Falkland Islands Pound"                                            "Fiji Dollar"                                                      
#>  [55] "Forint"                                                            "Ghana Cedi"                                                       
#>  [57] "Gibraltar Pound"                                                   "Gold"                                                             
#>  [59] "Gourde"                                                            "Guarani"                                                          
#>  [61] "Guinean Franc"                                                     "Guyana Dollar"                                                    
#>  [63] "Hong Kong Dollar"                                                  "Hryvnia"                                                          
#>  [65] "Iceland Krona"                                                     "Indian Rupee"                                                     
#>  [67] "Iranian Rial"                                                      "Iraqi Dinar"                                                      
#>  [69] "Jamaican Dollar"                                                   "Jordanian Dinar"                                                  
#>  [71] "Kenyan Shilling"                                                   "Kina"                                                             
#>  [73] "Kuna"                                                              "Kuwaiti Dinar"                                                    
#>  [75] "Kwanza"                                                            "Kyat"                                                             
#>  [77] "Lao Kip"                                                           "Lari"                                                             
#>  [79] "Lebanese Pound"                                                    "Lek"                                                              
#>  [81] "Lempira"                                                           "Leone"                                                            
#>  [83] "Liberian Dollar"                                                   "Libyan Dinar"                                                     
#>  [85] "Lilangeni"                                                         "Loti"                                                             
#>  [87] "Malagasy Ariary"                                                   "Malawi Kwacha"                                                    
#>  [89] "Malaysian Ringgit"                                                 "Mauritius Rupee"                                                  
#>  [91] "Mexican Peso"                                                      "Mexican Unidad de Inversion (UDI)"                                
#>  [93] "Moldovan Leu"                                                      "Moroccan Dirham"                                                  
#>  [95] "Mozambique Metical"                                                "Mvdol"                                                            
#>  [97] "Naira"                                                             "Nakfa"                                                            
#>  [99] "Namibia Dollar"                                                    "Nepalese Rupee"                                                   
#> [101] "Netherlands Antillean Guilder"                                     "New Israeli Sheqel"                                               
#> [103] "New Taiwan Dollar"                                                 "New Zealand Dollar"                                               
#> [105] "Ngultrum"                                                          "North Korean Won"                                                 
#> [107] "Norwegian Krone"                                                   "Ouguiya"                                                          
#> [109] "Pa’anga"                                                           "Pakistan Rupee"                                                   
#> [111] "Palladium"                                                         "Pataca"                                                           
#> [113] "Peso Convertible"                                                  "Peso Uruguayo"                                                    
#> [115] "Philippine Peso"                                                   "Platinum"                                                         
#> [117] "Pound Sterling"                                                    "Pula"                                                             
#> [119] "Qatari Rial"                                                       "Quetzal"                                                          
#> [121] "Rand"                                                              "Rial Omani"                                                       
#> [123] "Riel"                                                              "Romanian Leu"                                                     
#> [125] "Rufiyaa"                                                           "Rupiah"                                                           
#> [127] "Russian Ruble"                                                     "Rwanda Franc"                                                     
#> [129] "Saint Helena Pound"                                                "Saudi Riyal"                                                      
#> [131] "SDR (Special Drawing Right)"                                       "Serbian Dinar"                                                    
#> [133] "Seychelles Rupee"                                                  "Silver"                                                           
#> [135] "Singapore Dollar"                                                  "Sol"                                                              
#> [137] "Solomon Islands Dollar"                                            "Som"                                                              
#> [139] "Somali Shilling"                                                   "Somoni"                                                           
#> [141] "South Sudanese Pound"                                              "Sri Lanka Rupee"                                                  
#> [143] "Sucre"                                                             "Sudanese Pound"                                                   
#> [145] "Surinam Dollar"                                                    "Swedish Krona"                                                    
#> [147] "Swiss Franc"                                                       "Syrian Pound"                                                     
#> [149] "Taka"                                                              "Tala"                                                             
#> [151] "Tanzanian Shilling"                                                "Tenge"                                                            
#> [153] "The codes assigned for transactions where no currency is involved" "Trinidad and Tobago Dollar"                                       
#> [155] "Tugrik"                                                            "Tunisian Dinar"                                                   
#> [157] "Turkish Lira"                                                      "Turkmenistan New Manat"                                           
#> [159] "UAE Dirham"                                                        "Uganda Shilling"                                                  
#> [161] "Unidad de Fomento"                                                 "Unidad de Valor Real"                                             
#> [163] "Unidad Previsional"                                                "Uruguay Peso en Unidades Indexadas (UI)"                          
#> [165] "US Dollar"                                                         "US Dollar (Next day)"                                             
#> [167] "Uzbekistan Sum"                                                    "Vatu"                                                             
#> [169] "WIR Euro"                                                          "WIR Franc"                                                        
#> [171] "Won"                                                               "Yemeni Rial"                                                      
#> [173] "Yen"                                                               "Yuan Renminbi"                                                    
#> [175] "Zambian Kwacha"                                                    "Zimbabwe Dollar"                                                  
#> [177] "Zloty"

Standardise variable values

We standardise the target variable values, specifying that we are using the lookup codes method and not the fuzzy-logic / correspondences method.

X <- ratify(X, type_1L_chr = "Lookup")

This significantly reduces the umber of non-standard values for our target variable.

X@results_ls$Currency_Output_Validation$Invalid_Values
#>  [1] "Bitcoin[4] (as legal tender)" "Cook Islands dollar"          "Faroese króna"                "Guernsey pound"               "Jersey pound"                 "Kiribati dollar[E]"          
#>  [7] "Manx pound"                   "Niue dollar[E]"               "Pitcairn Islands dollar[E]"   "Sahrawi peseta"               "Tuvaluan dollar"              "Zimbabwean dollar"

If we wish we can remove the non-standardised values.

X <- renew(X, T, type_1L_chr = "slot", what_1L_chr = "force") 
X <- ratify(X, type_1L_chr = "Lookup")

We can no inspect our results a dataset for which the country names and currency names now conform to ISO standards.

X@results_ls$Currency_Output_Lookup %>%
  renew(type_1L_chr = "label") %>%
  exhibit(scroll_box_args_ls = list(width = "100%"))
Dataset
Country name Currency name Currency symbol Currency alphabetical ISO code (three letter) Currency's fractional unit Number of fractional units in basic unit
Afghanistan Afghani ؋‎ AFN Pul 100
Albania Lek Lek ALL Qintar 100
Algeria Algerian Dinar DA DZD Centime 100
Andorra Euro EUR Cent 100
Angola Kwanza Kz AOA Cêntimo 100
Anguilla East Caribbean Dollar \$ XCD Cent 100
Antigua and Barbuda East Caribbean Dollar \$ XCD Cent 100
Argentina Argentine Peso \$ ARS Centavo 100
Armenia Armenian Dram ֏ AMD Luma 100
Aruba Aruban Florin ƒ AWG Cent 100
Saint Helena, Ascension and Tristan da Cunha Saint Helena Pound £ SHP Penny 100
Australia Australian Dollar \$ AUD Cent 100
Austria Euro EUR Cent 100
Azerbaijan Azerbaijan Manat AZN Qəpik 100
Bahamas Bahamian Dollar \$ BSD Cent 100
Bahrain Bahraini Dinar BD BHD Fils 1000
Bangladesh Taka BDT Poisha 100
Barbados Barbados Dollar \$ BBD Cent 100
Belarus Belarusian Ruble Br BYN Copeck 100
Belgium Euro EUR Cent 100
Belize Belize Dollar \$ BZD Cent 100
Benin CFA Franc BCEAO Fr XOF Centime 100
Bermuda Bermudian Dollar \$ BMD Cent 100
Bhutan Ngultrum Nu BTN Chetrum 100
Bhutan Indian Rupee INR Paisa 100
Bolivia, Plurinational State of Boliviano Bs BOB Centavo 100
Bonaire, Sint Eustatius and Saba US Dollar \$ USD Cent 100
Bosnia and Herzegovina Convertible Mark KM BAM Fening 100
Botswana Pula P BWP Thebe 100
Brazil Brazilian Real R\$ BRL Centavo 100
British Indian Ocean Territory US Dollar \$ USD Cent 100
Virgin Islands, British US Dollar \$ USD Cent 100
Brunei Darussalam Brunei Dollar \$ BND Sen 100
Brunei Darussalam Singapore Dollar \$ SGD Cent 100
Bulgaria Bulgarian Lev Lev BGN Stotinka 100
Burkina Faso CFA Franc BCEAO Fr XOF Centime 100
Burundi Burundi Franc Fr BIF Centime 100
Cambodia Riel KHR Sen 100
Cambodia US Dollar \$ USD Cent 100
Cameroon CFA Franc BEAC Fr XAF Centime 100
Canada Canadian Dollar \$ CAD Cent 100
Cabo Verde Cabo Verde Escudo \$ CVE Centavo 100
Cayman Islands Cayman Islands Dollar \$ KYD Cent 100
Central African Republic CFA Franc BEAC Fr XAF Centime 100
Chad CFA Franc BEAC Fr XAF Centime 100
Chile Chilean Peso \$ CLP Centavo 100
China Yuan Renminbi ¥ CNY Jiao\[G\] 10
Colombia Colombian Peso \$ COP Centavo 100
Comoros Comorian Franc Fr KMF Centime 100
Congo, The Democratic Republic of the Congolese Franc Fr CDF Centime 100
Congo CFA Franc BEAC Fr XAF Centime 100
Cook Islands New Zealand Dollar \$ NZD Cent 100
Costa Rica Costa Rican Colon CRC Céntimo 100
Côte d'Ivoire CFA Franc BCEAO Fr XOF Centime 100
Croatia Euro EUR Cent 100
Cuba Cuban Peso \$ CUP Centavo 100
Curaçao Netherlands Antillean Guilder ƒ ANG Cent 100
Cyprus Euro EUR Cent 100
Czechia Czech Koruna CZK Heller 100
Denmark Danish Krone kr DKK Øre 100
Djibouti Djibouti Franc Fr DJF Centime 100
Dominica East Caribbean Dollar \$ XCD Cent 100
Dominican Republic Dominican Peso \$ DOP Centavo 100
Timor-Leste US Dollar \$ USD Centavo 100
Ecuador US Dollar \$ USD Centavo 100
Egypt Egyptian Pound LE EGP Piastre\[B\] 100
El Salvador US Dollar \$ USD Cent 100
Equatorial Guinea CFA Franc BEAC Fr XAF Centime 100
Eritrea Nakfa Nkf ERN Cent 100
Estonia Euro EUR Cent 100
Eswatini Lilangeni L or E (pl.) SZL Cent 100
Eswatini Rand R ZAR Cent 100
Ethiopia Ethiopian Birr Br ETB Santim 100
Falkland Islands (Malvinas) Falkland Islands Pound £ FKP Penny 100
Falkland Islands (Malvinas) Pound Sterling £ GBP Penny 100
Faroe Islands Danish Krone kr DKK Øre 100
Fiji Fiji Dollar \$ FJD Cent 100
Finland Euro EUR Cent 100
France Euro EUR Cent 100
French Polynesia CFP Franc Fr XPF Centime 100
French Southern Territories Euro EUR Cent 100
Gabon CFA Franc BEAC Fr XAF Centime 100
Gambia Dalasi D GMD Butut 100
Georgia Lari GEL Tetri 100
Germany Euro EUR Cent 100
Ghana Ghana Cedi GHS Pesewa 100
Gibraltar Gibraltar Pound £ GIP Penny 100
Gibraltar Pound Sterling £ GBP Penny 100
Greece Euro EUR Cent 100
Greenland Danish Krone kr DKK Øre 100
Grenada East Caribbean Dollar \$ XCD Cent 100
Guatemala Quetzal Q GTQ Centavo 100
Guernsey Pound Sterling £ GBP Penny 100
Guinea Guinean Franc Fr GNF Centime 100
Guinea-Bissau CFA Franc BCEAO Fr XOF Centime 100
Guyana Guyana Dollar \$ GYD Cent 100
Haiti Gourde G HTG Centime 100
Honduras Lempira L HNL Centavo 100
Hong Kong Hong Kong Dollar \$ HKD Cent 100
Hungary Forint Ft HUF Fillér 100
Iceland Iceland Krona kr ISK Eyrir 100
India Indian Rupee INR Paisa 100
Indonesia Rupiah Rp IDR Sen 100
Iran, Islamic Republic of Iranian Rial Rl or Rls (pl.) IRR Rial 1
Iraq Iraqi Dinar ID IQD Fils 1000
Ireland Euro EUR Cent 100
Isle of Man Pound Sterling £ GBP Penny 100
Israel New Israeli Sheqel ILS Agora 100
Italy Euro EUR Cent 100
Jamaica Jamaican Dollar \$ JMD Cent 100
Japan Yen ¥ JPY Sen\[C\] 100
Jersey Pound Sterling £ GBP Penny 100
Jordan Jordanian Dinar JD JOD Piastre\[H\] 100
Kazakhstan Tenge KZT Tıyn 100
Kenya Kenyan Shilling Sh or Shs (pl.) KES Cent 100
Kiribati Australian Dollar \$ AUD Cent 100
Korea, Democratic People's Republic of North Korean Won KPW Chon 100
Korea, Republic of Won KRW Jeon 100
Kuwait Kuwaiti Dinar KD KWD Fils 1000
Kyrgyzstan Som som KGS Tyiyn 100
Lao People's Democratic Republic Lao Kip LAK Att 100
Latvia Euro EUR Cent 100
Lebanon Lebanese Pound LL LBP Piastre 100
Lesotho Loti L or M (pl.) LSL Sente 100
Lesotho Rand R ZAR Cent 100
South Georgia and the South Sandwich Islands Falkland Islands Pound £ FKP Penny 100
South Georgia and the South Sandwich Islands Pound Sterling £ GBP Penny 100
Liberia Liberian Dollar \$ LRD Cent 100
Liberia US Dollar \$ USD Cent 100
Libya Libyan Dinar LD LYD Dirham 1000
Liechtenstein Swiss Franc Fr CHF Rappen 100
Lithuania Euro EUR Cent 100
Luxembourg Euro EUR Cent 100
Macao Pataca MOP\$ MOP Avo 100
Macao Hong Kong Dollar \$ HKD Cent 100
Madagascar Malagasy Ariary Ar MGA Iraimbilanja 5
Malawi Malawi Kwacha K MWK Tambala 100
Malaysia Malaysian Ringgit RM MYR Sen 100
Maldives Rufiyaa Rf MVR Laari 100
Mali CFA Franc BCEAO Fr XOF Centime 100
Malta Euro EUR Cent 100
Marshall Islands US Dollar \$ USD Cent 100
Mauritania Ouguiya UM MRU Khoums 5
Mauritius Mauritius Rupee Re or Rs (pl.) MUR Cent 100
Mexico Mexican Peso \$ MXN Centavo 100
Micronesia, Federated States of US Dollar \$ USD Cent 100
Moldova, Republic of Moldovan Leu Leu or Lei (pl.) MDL Ban 100
Monaco Euro EUR Cent 100
Mongolia Tugrik MNT Möngö 100
Montenegro Euro EUR Cent 100
Montserrat East Caribbean Dollar \$ XCD Cent 100
Morocco Moroccan Dirham DH MAD Centime 100
Mozambique Mozambique Metical Mt MZN Centavo 100
Myanmar Kyat K or Ks (pl.) MMK Pya 100
Namibia Namibia Dollar \$ NAD Cent 100
Namibia Rand R ZAR Cent 100
Nauru Australian Dollar \$ AUD Cent 100
Nepal Nepalese Rupee Re or Rs (pl.) NPR Paisa 100
Nepal Indian Rupee INR Paisa 100
Netherlands Euro EUR Cent 100
New Caledonia CFP Franc Fr XPF Centime 100
New Zealand New Zealand Dollar \$ NZD Cent 100
Nicaragua Cordoba Oro C\$ NIO Centavo 100
Niger CFA Franc BCEAO Fr XOF Centime 100
Nigeria Naira NGN Kobo 100
Niue New Zealand Dollar \$ NZD Cent 100
North Macedonia Denar DEN MKD Deni 100
Norway Norwegian Krone kr NOK Øre 100
Oman Rial Omani RO OMR Baisa 1000
Pakistan Pakistan Rupee Re or Rs (pl.) PKR Paisa 100
Palau US Dollar \$ USD Cent 100
Palestine, State of New Israeli Sheqel ILS Agora 100
Palestine, State of Jordanian Dinar JD JOD Piastre\[H\] 100
Panama Balboa B/ PAB Centésimo 100
Panama US Dollar \$ USD Cent 100
Papua New Guinea Kina K PGK Toea 100
Paraguay Guarani PYG Céntimo 100
Peru Sol S/ PEN Céntimo 100
Philippines Philippine Peso PHP Sentimo 100
Pitcairn New Zealand Dollar \$ NZD Cent 100
Poland Zloty PLN Grosz 100
Portugal Euro EUR Cent 100
Qatar Qatari Rial QR QAR Dirham 100
Romania Romanian Leu Leu or Lei (pl.) RON Ban 100
Russian Federation Russian Ruble RUB Kopeck 100
Rwanda Rwanda Franc Fr RWF Centime 100
Bonaire, Sint Eustatius and Saba US Dollar \$ USD Cent 100
Western Sahara Moroccan Dirham DH MAD Centime 100
Saint Helena, Ascension and Tristan da Cunha Saint Helena Pound £ SHP Penny 100
Saint Helena, Ascension and Tristan da Cunha Pound Sterling £ GBP Penny 100
Saint Kitts and Nevis East Caribbean Dollar \$ XCD Cent 100
Saint Lucia East Caribbean Dollar \$ XCD Cent 100
Saint Pierre and Miquelon Euro EUR Cent 100
Saint Pierre and Miquelon Canadian Dollar \$ CAD Cent 100
Saint Vincent and the Grenadines East Caribbean Dollar \$ XCD Cent 100
Samoa Tala \$ WST Sene 100
Saint Barthélemy Euro EUR Cent 100
San Marino Euro EUR Cent 100
Sao Tome and Principe Dobra Db STN Cêntimo 100
Saudi Arabia Saudi Riyal Rl or Rls (pl.) SAR Halala 100
Senegal CFA Franc BCEAO Fr XOF Centime 100
Serbia Serbian Dinar DIN RSD Para 100
Seychelles Seychelles Rupee Re or Rs (pl.) SCR Cent 100
Sierra Leone Leone Le SLE Cent 100
Singapore Singapore Dollar \$ SGD Cent 100
Singapore Brunei Dollar \$ BND Sen 100
Bonaire, Sint Eustatius and Saba US Dollar \$ USD Cent 100
Sint Maarten (Dutch part) Netherlands Antillean Guilder ƒ ANG Cent 100
Slovakia Euro EUR Cent 100
Slovenia Euro EUR Cent 100
Solomon Islands Solomon Islands Dollar \$ SBD Cent 100
Somalia Somali Shilling Sh or Shs (pl.) SOS Cent 100
South Africa Rand R ZAR Cent 100
South Sudan South Sudanese Pound (none) SSP Piaster 100
Spain Euro EUR Cent 100
Sri Lanka Sri Lanka Rupee Re or Rs (pl.) LKR Cent 100
Sudan Sudanese Pound LS SDG Piastre 100
Suriname Surinam Dollar \$ SRD Cent 100
Sweden Swedish Krona kr SEK Öre 100
Switzerland Swiss Franc Fr CHF Rappen\[J\] 100
Syrian Arab Republic Syrian Pound LS SYP Piastre 100
Taiwan, Province of China New Taiwan Dollar \$ TWD Cent 100
Tajikistan Somoni SM TJS Diram 100
Tanzania, United Republic of Tanzanian Shilling Sh or Shs (pl.) TZS Cent 100
Thailand Baht ฿ THB Satang 100
Togo CFA Franc BCEAO Fr XOF Centime 100
Tonga Pa'anga T\$ TOP Seniti 100
Trinidad and Tobago Trinidad and Tobago Dollar \$ TTD Cent 100
Tunisia Tunisian Dinar DT TND Millime 1000
Turkey Turkish Lira TRY Kuruş 100
Turkmenistan Turkmenistan New Manat m TMT Tenge 100
Turks and Caicos Islands US Dollar \$ USD Cent 100
Tuvalu Australian Dollar \$ AUD Cent 100
Uganda Uganda Shilling Sh or Shs (pl.) UGX (none) (none)
Ukraine Hryvnia UAH Kopeck 100
United Arab Emirates UAE Dirham Dh or Dhs (pl.) AED Fils 100
United Kingdom Pound Sterling £ GBP Penny 100
United States US Dollar \$ USD Cent\[A\] 100
Uruguay Peso Uruguayo \$ UYU Centésimo 100
Uzbekistan Uzbekistan Sum soum UZS Tiyin 100
Vanuatu Vatu VT VUV Cent 100
Holy See (Vatican City State) Euro EUR Cent 100
Venezuela, Bolivarian Republic of Bolívar Soberano Bs.S VES Céntimo 1
Venezuela, Bolivarian Republic of Bolívar Soberano Bs.D VED Céntimo 100
Venezuela, Bolivarian Republic of US Dollar \$ USD Cent 100
Viet Nam Dong VND Hào\[L\] 10
Wallis and Futuna CFP Franc Fr XPF Centime 100
Yemen Yemeni Rial Rl or Rls (pl.) YER Fils 100
Zambia Zambian Kwacha K ZMW Ngwee 100
Zimbabwe US Dollar \$ USD Cent 100

5 - Score health utility

Using modules from the scorz R package, individual responses to a multi-attribute utility instrument survey can be converted into health utility total scores. This tutorial describes how to do for adolescent AQoL-6D health utility.

This below section renders a vignette article from the scorz library. You can use the following links to:

Note: This vignette is illustrated with fake data. The dataset explored in this example should not be used to inform decision-making. Some of the methods illustrated in this AQoL-6D vignette can also be used to score other health utility instruments - see a vignette about scoring EQ-5D.

AQoL-6D scoring

To derive a health utility score from the raw responses to a multi-attribute utility instrument it is necessary to implement a scoring algorithm. Scoring algorithms for the Assessment of Quality of Life Six Dimension (AQoL-6D) are publicly available in SPSS format (https://www.aqol.com.au/index.php/scoring-algorithms).

However, to include scoring algorithms in reproducible research workflows, it is desirable to have these algorithms available in open science languages such as R. The scorz package includes ready4 framework model modules of the ready4 youth mental health economic model that provide R implementations of the adult and adolescent versions of the AQoL-6D scoring algorithms.

Ingest data

To begin, we ingest an unscored dataset as an instance of the Ready4useDyad from the ready4use package. In this case we download our data from a remote repository.

X <- ready4use::Ready4useRepos(dv_nm_1L_chr = "fakes",
                               dv_ds_nm_1L_chr = "https://doi.org/10.7910/DVN/W95KED",
                               dv_server_1L_chr = "dataverse.harvard.edu") %>%
  ingest(fls_to_ingest_chr = "ymh_clinical_dyad_r4",
         metadata_1L_lgl = F) 

To make the ingested dataset easier to interpret, we can add labels from the dictionary.

X <- X %>%
  renew(type_1L_chr = "label")

We can now inspect our ingested dataset using the exhibit method.

exhibit(X,
        display_1L_chr = "head",
         scroll_box_args_ls = list(width = "100%"))
Dataset
Unique client identifier Round of data collection Date of data collection Age Gender Sex at birth Sexual orientation Aboriginal or Torres Strait Islander Country Of birth Speaks English at home Native English speaker Education and employment status Relationship status Service centre name Primary diagnosis Clinical stage Kessler Psychological Distress Scale (6 Dimension) Patient Health Questionnaire Behavioural Activation for Depression Scale Generalised Anxiety Disorder Scale Overall Anxiety Severity and Impairment Scale Screen for Child Anxiety Related Disorders Social and Occupational Functioning Assessment Scale Assessment of Quality of Life (6 Dimension) question 1 Assessment of Quality of Life (6 Dimension) question 2 Assessment of Quality of Life (6 Dimension) question 3 Assessment of Quality of Life (6 Dimension) question 4 Assessment of Quality of Life (6 Dimension) question 5 Assessment of Quality of Life (6 Dimension) question 6 Assessment of Quality of Life (6 Dimension) question 7 Assessment of Quality of Life (6 Dimension) question 8 Assessment of Quality of Life (6 Dimension) question 9 Assessment of Quality of Life (6 Dimension) question 10 Assessment of Quality of Life (6 Dimension) question 11 Assessment of Quality of Life (6 Dimension) question 12 Assessment of Quality of Life (6 Dimension) question 13 Assessment of Quality of Life (6 Dimension) question 14 Assessment of Quality of Life (6 Dimension) question 15 Assessment of Quality of Life (6 Dimension) question 16 Assessment of Quality of Life (6 Dimension) question 17 Assessment of Quality of Life (6 Dimension) question 18 Assessment of Quality of Life (6 Dimension) question 19 Assessment of Quality of Life (6 Dimension) question 20
Participant_1 Baseline 2020-03-22 14 Male Male Heterosexual No Australia Yes Yes Not studying or working In a relationship Southport Other 0-1a 8 7 96 6 6 28 69 2 3 1 2 3 1 1 2 4 3 3 4 2 4 2 2 2 2 2 1
Participant_2 Baseline 2020-06-15 19 Female Female Heterosexual Yes Other No No Studying only In a relationship Regional Centre Anxiety 0-1a 13 13 63 12 12 41 58 3 3 1 1 3 2 1 3 2 4 4 3 4 3 1 2 2 2 1 1
Participant_3 Baseline 2020-08-20 21 Female Female Other NA NA NA NA Studying only Not in a relationship Canberra Anxiety 1b 12 17 72 16 12 43 72 2 3 2 5 1 1 1 2 4 5 2 4 2 2 2 1 1 1 1 1
Participant_4 Baseline 2020-05-23 12 Female Female Heterosexual Yes Other No No Not studying or working In a relationship Southport Depression and Anxiety 2-4 17 17 75 12 10 51 88 1 2 1 1 3 3 1 4 4 3 3 3 4 2 1 1 2 1 3 1
Participant_5 Baseline 2020-04-05 19 Male Male Heterosexual Yes Other No No Not studying or working Not in a relationship Southport Depression and Anxiety 0-1a 12 22 82 14 14 51 67 2 2 1 3 5 1 1 1 1 5 4 4 3 2 1 2 1 3 2 3
Participant_6 Baseline 2020-06-09 19 Male Male Heterosexual Yes Other No No Studying only In a relationship Regional Centre Anxiety 1b 11 8 105 8 3 46 60 1 2 2 1 2 2 4 1 3 3 4 3 4 2 1 2 1 2 1 1

We now add meta-data that identifies our dataset as being longitudinal using the YouthvarsSeries module of the youthvars package.

X <- youthvars::YouthvarsSeries(a_Ready4useDyad = X,
                                id_var_nm_1L_chr = "fkClientID",
                                timepoint_var_nm_1L_chr = "round",
                                timepoint_vals_chr = levels(X@ds_tb$round))

We now use the data and meta-data we have created in the previous steps to create an instance of the ScorzAqol6Adol class. This class is specifically designed to facilitate scoring of the adolescent version of the AQoL-6D instrument.

Y <- ScorzAqol6Adol(a_YouthvarsProfile = X)

By default, instances of the ScorzAqol6Adol class are created with a slot specifying a value for the prefix for AQoL-6D questionnaire item responses.

procureSlot(Y,
            slot_nm_1L_chr = "itm_prefix_1L_chr")
#> [1] "aqol6d_q"

If this default value needs to be updated to match the prefix used in your dataset, use the renewSlot method.

# Not run
# Y <- renewSlot(Y, slot_nm_1L_chr = "itm_prefix_1L_chr", new_val_xx = "new_prefix")

Calculating scores

To calculate AQoL 6D adolescent utility scores, use the renew method.

Y <- renew(Y)

Viewing the updated dataset

We can inspect our updated dataset using the exhibit method. We can see that the updated dataset now has additional variables that include the intermediate and final calculations for AQoL-6D adolescent utility scores.

exhibit(Y,
        display_1L_chr = "head",
         scroll_box_args_ls = list(width = "100%"))
Dataset
Unique client identifier Round of data collection Date of data collection Age Gender Sex at birth Sexual orientation Aboriginal or Torres Strait Islander Country Of birth Speaks English at home Native English speaker Education and employment status Relationship status Service centre name Primary diagnosis Clinical stage Kessler Psychological Distress Scale (6 Dimension) Patient Health Questionnaire Behavioural Activation for Depression Scale Generalised Anxiety Disorder Scale Overall Anxiety Severity and Impairment Scale Screen for Child Anxiety Related Disorders Social and Occupational Functioning Assessment Scale Assessment of Quality of Life (6 Dimension) question 1 Assessment of Quality of Life (6 Dimension) question 2 Assessment of Quality of Life (6 Dimension) question 3 Assessment of Quality of Life (6 Dimension) question 4 Assessment of Quality of Life (6 Dimension) question 5 Assessment of Quality of Life (6 Dimension) question 6 Assessment of Quality of Life (6 Dimension) question 7 Assessment of Quality of Life (6 Dimension) question 8 Assessment of Quality of Life (6 Dimension) question 9 Assessment of Quality of Life (6 Dimension) question 10 Assessment of Quality of Life (6 Dimension) question 11 Assessment of Quality of Life (6 Dimension) question 12 Assessment of Quality of Life (6 Dimension) question 13 Assessment of Quality of Life (6 Dimension) question 14 Assessment of Quality of Life (6 Dimension) question 15 Assessment of Quality of Life (6 Dimension) question 16 Assessment of Quality of Life (6 Dimension) question 17 Assessment of Quality of Life (6 Dimension) question 18 Assessment of Quality of Life (6 Dimension) question 19 Assessment of Quality of Life (6 Dimension) question 20 Assessment of Quality of Life (6 Dimension) item disvalue1 Assessment of Quality of Life (6 Dimension) item disvalue2 Assessment of Quality of Life (6 Dimension) item disvalue3 Assessment of Quality of Life (6 Dimension) item disvalue4 Assessment of Quality of Life (6 Dimension) item disvalue5 Assessment of Quality of Life (6 Dimension) item disvalue6 Assessment of Quality of Life (6 Dimension) item disvalue7 Assessment of Quality of Life (6 Dimension) item disvalue8 Assessment of Quality of Life (6 Dimension) item disvalue9 Assessment of Quality of Life (6 Dimension) item disvalue10 Assessment of Quality of Life (6 Dimension) item disvalue11 Assessment of Quality of Life (6 Dimension) item disvalue12 Assessment of Quality of Life (6 Dimension) item disvalue13 Assessment of Quality of Life (6 Dimension) item disvalue14 Assessment of Quality of Life (6 Dimension) item disvalue15 Assessment of Quality of Life (6 Dimension) item disvalue16 Assessment of Quality of Life (6 Dimension) item disvalue17 Assessment of Quality of Life (6 Dimension) item disvalue18 Assessment of Quality of Life (6 Dimension) item disvalue19 Assessment of Quality of Life (6 Dimension) item disvalue20 Disvalue Score for Dimension 1 - Independent Living Disvalue Score for Dimension 2 - Relationships Disvalue Score for Dimension 3 - Mental Health Disvalue Score for Dimension 4 - Coping Disvalue Score for Dimension 5 - Pain Disvalue Score for Dimension 6 - Senses Adult Score Dimension 1 - Independent Living Adult Score Dimension 2 - Relationships Adult Score Dimension 3 - Mental Health Adult Score Dimension 4 - Coping Adult Score Dimension 5 - Pain Adult Score Dimension 6 - Senses Overall score on a 0-1 disvalue scale Overall score on a life-death disutility scale AQoL-6D Adolescent Disutility Score (Untransformed) AQoL-6D Adolescent Disutility Score (Transformed) Instrument utility score Instrument utility score rotated AQOL-6D (weighted total) AQOL-6D (unweighted total)
Participant_1 Baseline 2020-03-22 14 Male Male Heterosexual No Australia Yes Yes Not studying or working In a relationship Southport Other 0-1a 8 7 96 6 6 28 69 2 3 1 2 3 1 1 2 4 3 3 4 2 4 2 2 2 2 2 1 0.073 0.240 0.000 0.040 0.461 0.000 0.000 0.133 0.824 0.330 0.368 0.722 0.055 0.826 0.133 0.2 0.072 0.033 0.024 0.000 0.19334101 0.2964368 0.7312060 0.7708396 0.2619285 0.03009428 0.8066590 0.7035632 0.2687940 0.2291604 0.7380715 0.9699057 0.6436897 0.7286568 0.55838936 0.55838936 0.4416106 0.5078265 0.5698492 46
Participant_10 Baseline 2020-08-05 15 Female Female Other Yes Other No No Studying and working Not in a relationship Canberra Other 0-1a 11 17 34 13 15 38 60 1 2 2 3 5 1 3 3 4 4 3 4 3 3 1 2 2 3 2 1 0.000 0.033 0.041 0.297 1.000 0.000 0.648 0.392 0.824 0.784 0.368 0.722 0.382 0.423 0.000 0.2 0.072 0.223 0.024 0.000 0.27064870 0.7770111 0.8683514 0.6579841 0.1935407 0.13938313 0.7293513 0.2229889 0.1316486 0.3420159 0.8064593 0.8606169 0.7541542 0.8537026 0.74739738 0.74739738 0.2526026 0.3413671 0.3916050 52
Participant_10 Follow-up 2020-11-07 15 Female Female Other Yes Other No No Not studying or working Not in a relationship Regional Centre Depression 1b 7 17 95 14 10 48 64 2 3 2 1 2 2 2 2 2 3 3 5 3 2 3 1 2 2 3 2 0.073 0.240 0.041 0.000 0.074 0.193 0.197 0.133 0.142 0.330 0.368 1.000 0.382 0.057 0.642 0.0 0.072 0.033 0.205 0.187 0.18835933 0.2602305 0.5155772 0.5858738 0.4342728 0.21476953 0.8116407 0.7397695 0.4844228 0.4141262 0.5657272 0.7852305 0.6473112 0.7327563 0.56418597 0.56418597 0.4358140 0.5027214 0.5645345 47
Participant_100 Baseline 2020-07-19 25 Female Female Other Yes Other No No Working only In a relationship Canberra Depression and Anxiety 0-1a 7 0 120 3 0 21 76 1 1 1 1 2 1 2 2 2 2 2 2 5 3 2 1 3 1 1 1 0.000 0.000 0.000 0.000 0.074 0.000 0.197 0.133 0.142 0.097 0.064 0.056 1.000 0.423 0.133 0.0 0.338 0.000 0.000 0.000 0.00000000 0.1433888 0.2505682 0.7769222 0.2866694 0.00000000 1.0000000 0.8566112 0.7494318 0.2230778 0.7133306 1.0000000 0.4558633 0.5160373 0.29587849 0.29587849 0.7041215 0.7390198 0.7978085 36
Participant_1000 Baseline 2020-09-06 16 Male Male Heterosexual Yes Other No No Not studying or working Not in a relationship Canberra Anxiety 0-1a 0 0 128 0 0 0 71 2 1 1 1 1 2 1 2 1 2 2 1 2 3 1 1 1 2 1 1 0.073 0.000 0.000 0.000 0.000 0.193 0.000 0.133 0.000 0.097 0.064 0.000 0.055 0.423 0.000 0.0 0.000 0.033 0.000 0.000 0.02813508 0.1346642 0.1819574 0.3514811 0.0000000 0.01916297 0.9718649 0.8653358 0.8180426 0.6485189 1.0000000 0.9808370 0.2379252 0.2693314 0.08939064 0.08939064 0.9106094 0.9208737 0.9511345 29
Participant_1000 Follow-up 2020-12-20 16 Male Male Heterosexual Yes Other No No Not studying or working Not in a relationship Southport Anxiety 1b 5 0 117 5 1 14 71 2 2 1 1 1 1 2 1 3 1 2 3 2 2 1 1 1 1 2 1 0.073 0.033 0.000 0.000 0.000 0.000 0.197 0.000 0.392 0.000 0.064 0.338 0.055 0.057 0.000 0.0 0.000 0.000 0.024 0.000 0.04719190 0.1002056 0.2658587 0.2080310 0.0000000 0.01111253 0.9528081 0.8997944 0.7341413 0.7919690 1.0000000 0.9888875 0.2228889 0.2523102 0.07926885 0.07926885 0.9207312 0.9297879 0.9576133 31

Creating summary plots

To create plots, we use the depict method.

We can create a list of summary plots by timepoint for all individual items.

plot_ls <- depict(Y, type_1L_chr = "item_by_time")

We can then select a desired item’s summary plot by using its index number.

plot_ls[[1]]
AQoL-6D Item 1 scores by data-collection round

AQoL-6D Item 1 scores by data-collection round

Alternatively, we can generate individual plots by passing the item index number to the var_idcs_int argument of depict.

depict(Y, type_1L_chr = "item_by_time", var_idcs_int = 2L)
AQoL-6D Item 2 scores by data-collection round

AQoL-6D Item 2 scores by data-collection round

We can also plot domain scores by time.

depict(Y, type_1L_chr = "domain_by_time", var_idcs_int = 1L)
AQoL-6D Independet Living Domain weighted scores by data-collection round

AQoL-6D Independet Living Domain weighted scores by data-collection round

Total AQoL-6D scores can also be plotted using the same approach, where var_idcs_int = 1L is used to plot the weighted total distribution and var_idcs_int = 2L is used for plotting the unweighted total.

depict(Y, type_1L_chr = "total_by_time", var_idcs_int = 1L)
AQoL-6D item total weighted scores by data-collection round

AQoL-6D item total weighted scores by data-collection round

Composite plots can be generated as well, though these are not currently optimised to reliably produce quality plots suitable for publication.

depict(Y, type_1L_chr = "comp_item_by_time")
AQoL-6D item responses by data-collection round

AQoL-6D item responses by data-collection round

depict(Y, type_1L_chr = "comp_domain_by_time")
AQoL-6D weighted domain scores by data-collection round

AQoL-6D weighted domain scores by data-collection round

Share output

We can now publicly share our scored dataset and its associated metadata, using Ready4useRepos and its share method as described in a vignette from the ready4use package.

Z <- ready4use::Ready4useRepos(gh_repo_1L_chr = "ready4-dev/scorz", # Replace with details of your repo.
                               gh_tag_1L_chr = "Documentation_0.0") # You must have write permissions.
Z <- share(Z,
           obj_to_share_xx = Y,
           fl_nm_1L_chr = "ymh_ScorzAqol6Adol")

Y is now available for download as the file ymh_ScorzAqol6Adol.RDS from the “Documentation_0.0” release of the scorz package.

6 - Explore candidate utility mapping models

Using modules from the specific R package, it is possible to undertake an exploratory utility mapping analysis. This tutorial illustrates a hypotehtical example of exploring how to map to EQ-5D health utility.

This below section renders a vignette article from the specific library. You can use the following links to:

Note: This vignette uses fake data - it is for illustrative purposes only and should not be used to inform decision making. The specific package includes ready4 framework model modules that form part of the ready4 youth mental health economic model. Currently, these modules are not optimised to be used directly, but are instead intended for use in other model modules. For example, the TTU package includes modules that extend specific modules to help implement utility mapping studies. However, to illustrate the main features of specific modules this vignette demonstrates how specific modules could be used independently. In practice, workflow illustrated in this article would probably need to be performed iteratively in order to identify the optimal model types, predictors and covariates and to update default values to ensure model convergence.

By default, modules in the specific package will request your consent before writing files to your machine. This is the safest option. However, as there are many files that need to be written locally for this program to execute, you can overwrite this default by supplying the value “Y” to methods with a consent_1L_chr argument.

consent_1L_chr <- "" # Default value - asks for consent prior to writing each file.

Import data

We start by ingesting our data. As this example uses EQ-5D data, we import a ScorzEuroQol5 ready4 framework module (created using the steps described in this vignette from the scorz pacakge) into a SpecificConverter Module and then apply the metamorphose method to convert it into a SpecificModel module.

X <- SpecificConverter(a_ScorzProfile = ready4use::Ready4useRepos(gh_repo_1L_chr = "ready4-dev/scorz", 
                                                                  gh_tag_1L_chr = "Documentation_0.0") %>%
                         ingest(fls_to_ingest_chr = "ymh_ScorzEuroQol5",  metadata_1L_lgl = F)) %>% 
  metamorphose() 
class(X)
#> [1] "SpecificModels"
#> attr(,"package")
#> [1] "specific"

Inspect data

The dataset we are using has a total of 1786 records at two timepoints on 1068 study participants. The first six records are reproduced below.

Dataset
Unique identifier Data collection round Date of data collection Age Gender (grouped) Sex at birth Sexual orientation Relationship status Aboriginal or Torres Strait Islander Culturally And Linguistically Diverse Region of residence (metropolitan or regional) Education and employment status EQ5D - Mobility domain score EQ5D - Self-Care domain score EQ5D - Usual Activities domain score EQ5D - Pain / Discomfort domain score EQ5D - Anxiety / Depression domain score Kessler Psychological Distress - 10 Item Total Score Overall Wellbeing Measure (Winefield et al. 2012) EuroQol (EQ-5D) - (weighted total) EuroQol (EQ-5D) - (unweighted total)
1 BL 2019-10-22 14 Male Male Heterosexual In a relationship No No Metro Not studying or working 1 1 1 1 2 11 87 0.879 6
2 BL 2019-10-17 19 Female Female Heterosexual In a relationship Yes Yes Regional Studying only 1 2 1 1 1 14 65 0.846 6
2 FUP 2020-02-14 19 Female Female Heterosexual In a relationship Yes Yes Regional Studying only 3 1 1 1 1 10 71 0.850 7
3 BL 2020-02-15 21 Female Female Other Not in a relationship NA NA Metro Studying only 1 1 3 1 1 13 74 0.883 7
3 FUP 2020-06-14 21 Female Female Other Not in a relationship NA NA Metro Studying only 1 1 2 1 1 10 64 0.906 6
4 BL 2019-12-14 12 Female Female Heterosexual In a relationship Yes Yes Metro Not studying or working 1 1 1 3 1 18 40 0.796 7

To source dataset of X is contained in the a_YouthvarsProfile slot and is a YouthvarsSeries module. For more information about methods that can be used to explore this dataset, read this vignette from the youthvars package.

Specify parameters

In preparation for exploring our dataset, we need to declare a set of model parameters in a b_SpecificParameters slot of X. This can be done in one step, or in sequential steps. In this example, we will proceed sequentially.

Dependent variable

The dependent variable (total EQ-5D utility score) has already been specified when we imported the data from the ScorzEuroQol5 module.

procureSlot(X, "b_SpecificParameters@depnt_var_nm_1L_chr")
#> [1] "eq5d_total_w"

We can now add details of the allowable range of dependent variable values.

X <- renewSlot(X, "b_SpecificParameters@depnt_var_min_max_dbl", c(-1,1))

Candidate predictors

We can now specify the names of candidate predictor variables.

X <- renewSlot(X, "b_SpecificParameters@candidate_predrs_chr", c("K10_int","Psych_well_int")) 

We next add meta-data about each candidate predictor variable in the form of a specific_predictors object.

X <- renewSlot(X, "b_SpecificParameters@predictors_lup", class_chr = "integer", class_fn_chr = c("youthvars::youthvars_k10_aus","as.integer"), covariate_lgl = F, increment_dbl = 1,
               long_name_chr = c("Kessler Psychological Distress - 10 Item Total Score", "Overall Wellbeing Measure (Winefield et al. 2012)"), max_val_dbl = c(50,90), min_val_dbl = c(10,18), mdl_scaling_dbl = 0.01,
               short_name_chr = c("K10_int","Psych_well_int"))

The specific_predictors object that we have added to X can be inspected using the exhibitSlot method.

exhibitSlot(X, "b_SpecificParameters@predictors_lup", scroll_box_args_ls = list(width = "100%"))
Variable Description Minimum Maximum Class Increment Function Scaling Covariate
K10_int Kessler Psychological Distress - 10 Item Total Score 10 50 integer 1 youthvars::youthvars_k10_aus 0.01 FALSE
Psych_well_int Overall Wellbeing Measure (Winefield et al. 2012) 18 90 integer 1 as.integer 0.01 FALSE

Covariates

We also specify the covariates that we aim to explore in conjunction with each candidate predictor.

X <- renewSlot(X, "b_SpecificParameters@candidate_covars_chr", c("d_sex_birth_s", "d_age",  "d_sexual_ori_s", "d_studying_working"))

Descriptive variables

We also specify variables that we will use for generating descriptive statistics about the dataset.

X <- renewSlot(X,"b_SpecificParameters@descv_var_nms_chr", c("d_age","Gender","d_relation_s", "d_sexual_ori_s", "Region", "d_studying_working")) 

Temporal variables

The name of the dataset variable for data collection timepoint and all of its unique values were imported when converting the ScorzEuroQol5 module.

procureSlot(X,"a_YouthvarsProfile@timepoint_var_nm_1L_chr")
#> [1] "Timepoint"
procureSlot(X,"a_YouthvarsProfile@timepoint_vals_chr")
#> [1] "BL"  "FUP"

However, we also need to specify the name of the variable that contains the datestamp for each dataset record.

X <- renewSlot(X, "b_SpecificParameters@msrmnt_date_var_nm_1L_chr", "data_collection_dtm")

Candidate models

X was created with a default set of candidate models, stored as a specific_models sub-module, which can be inspected using the exhibitSlot method.

exhibitSlot(X, "b_SpecificParameters@candidate_mdls_lup", scroll_box_args_ls = list(width = "100%"))
Model types lookup table
Reference Name Control Familty Function Start Predict Transformation Binomial Acronym (Fixed) Acronymy (Mixed) Type (Mixed) With
OLS_NTF Ordinary Least Squares (no transformation) NA NA lm NA NA NTF FALSE OLS LMM linear mixed model no transformation
OLS_LOG Ordinary Least Squares (log transformation) NA NA lm NA NA LOG FALSE OLS LMM linear mixed model log transformation
OLS_LOGIT Ordinary Least Squares (logit transformation) NA NA lm NA NA LOGIT FALSE OLS LMM linear mixed model logit transformation
OLS_LOGLOG Ordinary Least Squares (log log transformation) NA NA lm NA NA LOGLOG FALSE OLS LMM linear mixed model log log transformation
OLS_CLL Ordinary Least Squares (complementary log log transformation) NA NA lm NA NA CLL FALSE OLS LMM linear mixed model complementary log log transformation
GLM_GSN_LOG Generalised Linear Model with Gaussian distribution and log link NA gaussian(log) glm -0.1,-0.1 response NTF FALSE GLM GLMM generalised linear mixed model Gaussian distribution and log link
BET_LGT Beta Regression Model with Binomial distribution and logit link betareg::betareg.control NA betareg::betareg -0.5,-0.1,3 response NTF FALSE GLM GLMM generalised linear mixed model Binomial distribution and logit link
BET_CLL Beta Regression Model with Binomial distribution and complementary log log link betareg::betareg.control NA betareg::betareg -0.5,-0.1,3 response NTF FALSE GLM GLMM generalised linear mixed model Binomial distribution and complementary log log link

We can choose to select just a subset of these to explore using the renewSlot method. As this is an illustrative example, we have restricted the models we will explore to just four types, passing the relevant row numbers to the slice_indcs_int argument.

X <- renewSlot(X, "b_SpecificParameters@candidate_mdls_lup", slice_indcs_int = c(1L,5L,7L,8L))

Other parameters

Depending on the type of analysis we plan on undertaking, we can also specify parameters such as the number of folds to use in cross validation, the maximum number of model runs to allow and a seed to ensure reproducibility of results. In this case we are going to use the default values generated when we first created X.

procureSlot(X, "b_SpecificParameters@folds_1L_int")
#> [1] 10
procureSlot(X, "b_SpecificParameters@max_mdl_runs_1L_int")
#> [1] 300
procureSlot(X, "b_SpecificParameters@seed_1L_int")
#> [1] 1234

Model testing

Before we start to use the data stored in X to undertake modelling, we must first validate that it contains all necessary (and internally consistent) data by using the ratify method. The call to ratify will update any variable names that are likely to cause problems when generating reports (e.g. through inclusion of characters like “_” in the variable name that can cause problems when rendering LaTeX documents).

X <- ratify(X)

Set-up workspace

We add details of the directory to which we will write all output. In this example we create a temporary directory (tempdir()), but in practice this would be an existing directory on your local machine.

X <- renewSlot(X, "paths_chr", tempdir())

It can be useful to save fake data (useful for demonstrating the generalisability and replicability of an analysis) and real data (required for write-up and reproducibility) is distinctly labelled directories. By default, X is created with a flag to save all output in a sub-directory “Real”. As we are using fake data, we can override this value.

X <- renewSlot(X, "b_SpecificParameters@fake_1L_lgl", T)

We can now write a number of sub-directories to our specified output directory.

X <- author(X, what_1L_chr = "workspace", consent_1L_chr = consent_1L_chr)
#> New directories created:
#> C:\Users\mham0053\AppData\Local\Temp\RtmpWkpbqI/Fake
#> C:\Users\mham0053\AppData\Local\Temp\RtmpWkpbqI/Fake/Markdown
#> C:\Users\mham0053\AppData\Local\Temp\RtmpWkpbqI/Fake/Output
#> C:\Users\mham0053\AppData\Local\Temp\RtmpWkpbqI/Fake/Reports
#> C:\Users\mham0053\AppData\Local\Temp\RtmpWkpbqI/Fake/Output/_Descriptives
#> C:\Users\mham0053\AppData\Local\Temp\RtmpWkpbqI/Fake/Output/H_Dataverse

Descriptives

The first set of outputs we write to our output directories is a set of descriptive tables and plots.

X <- author(X, consent_1L_chr = consent_1L_chr, digits_1L_int = 3L,  what_1L_chr = "descriptives")

Model comparisons

The investigate method can now be used to compare the candidate models we have specified earlier. In so doing it will transform X into a SpecificPredictors object.

X <- investigate(X, consent_1L_chr = consent_1L_chr, depnt_var_max_val_1L_dbl = 0.99, session_ls = sessionInfo())
class(X)
#> [1] "SpecificPredictors"
#> attr(,"package")
#> [1] "specific"

The investigate method will write each model to be tested to a new sub-directory of our output directory.

The investigate method also outputs a table summarising the performance of each of the candidate models.

exhibit(X, what_1L_chr = "mdl_cmprsn", type_1L_chr = "results") 
Comparison of candidate models using highest correlated predictor

Training model fit (averaged over 10 folds)

Testing model fit (averaged over 10 folds)

Model R-Squared RMSE MAE R-Squared RMSE MAE
Beta Regression Model with Binomial distribution and logit link 0.4318533 0.0742448 0.0587307 0.4128497 0.0741236 0.0587733
Beta Regression Model with Binomial distribution and complementary log log link 0.4174181 0.0751836 0.0593447 0.3996947 0.0750880 0.0594047
Ordinary Least Squares (no transformation) 0.4106104 0.0756222 0.0596955 0.3933147 0.0755461 0.0597672
Ordinary Least Squares (complementary log log transformation) 0.4105040 0.0756284 0.0597793 0.3913360 0.0755268 0.0598295

We can now identify the highest performing model in each category of candidate model based on the testing R2 statistic.

procure(X, what_1L_chr = "prefd_mdls") 
#> [1] "BET_LGT" "OLS_NTF"

We can override these automated selections and instead incorporate other considerations (possibly based on judgments informed by visual inspection of the plots and the desirability of constraining predictions to a maximum value of one). We do this in the following command, specifying new preferred model types, in descending order of preference.

X <- renew(X, new_val_xx = c("BET_LGT", "OLS_CLL"), type_1L_chr = "results", what_1L_chr = "prefd_mdls")

Use most preferred model to compare all candidate predictors

We can now compare all of our candidate predictors (with and without candidate covariates) using the most preferred model type.

X <- investigate(X, consent_1L_chr = consent_1L_chr)
class(X)
#> [1] "SpecificFixed"
#> attr(,"package")
#> [1] "specific"

Now, we compare the performance of single predictor models of our preferred model type (in our case, a Beta Regression Model with Binomial distribution and logit link) for each candidate predictor. The last call to the investigate saved the tested models along with model plots in a sub-directory of our output directory. These results are also viewable as a table.

exhibit(X, scroll_box_args_ls = list(width = "100%"), type_1L_chr = "results", what_1L_chr = "predr_cmprsn")
Comparison of all candidate predictors using preferred model
predr_chr %IncMSE IncNodePurity
K10 0.0066197 3.888246
Psychwell 0.0011094 2.342784

The most recent call to the investigate method also saved single predictor R model objects (one for each candidate predictors) along with the two plots for each model in a sub-directory of our output directory. The performance of each single predictor model can also be summarised in a table.

exhibit(X, type_1L_chr = "results", what_1L_chr = "fxd_sngl_cmprsn")
Preferred single predictor model performance by candidate predictor

Training model fit (averaged over 10 folds)

Testing model fit (averaged over 10 folds)

Model R-Squared RMSE MAE R-Squared RMSE MAE
K10 0.4318533 0.0742448 0.0587307 0.4128497 0.0741236 0.0587733
Psychwell 0.1507472 0.0907813 0.0699606 0.1341090 0.0909203 0.0700686

Updated versions of each of the models in the previous step (this time with covariates added) are saved to a new subdirectory of the output directory and we can summarise the performance of each of the updated models, along with all signficant model terms, in a table.

exhibit(X, scroll_box_args_ls = list(width = "100%"), type_1L_chr = "results", what_1L_chr = "fxd_full_cmprsn")

We can now identify which, if any, of the candidate covariates we previously specified are significant predictors in any of the models.

procure(X, type_1L_chr = "results", what_1L_chr = "signt_covars")
#> [1] NA

We can override the covariates to select, potentially because we want to select only covariates that are significant for all or most of the models. However, in the below example we have opted not to do so and continue to use no covariates as selected by the algorithm in the previous step.

# X <- renew(X, new_val_xx = c("COVARIATE OF YOUR CHOICE", "ANOTHER COVARIATE"), type_1L_chr = "results", what_1L_chr = "prefd_covars")

Test preferred model with preferred covariates for each candidate predictor

We now conclude our model testing by rerunning the previous step, except confining our covariates to those we prefer.

X <- investigate(X, consent_1L_chr = consent_1L_chr)
class(X)
#> [1] "SpecificMixed"
#> attr(,"package")
#> [1] "specific"

The previous call to the write_mdls_with_covars_cmprsn function saves the tested models along with two plots for each model in the “E_Predrs_W_Covars_Sngl_Mdl_Cmprsn” sub-directory of “Output”.

Apply preferred model types and predictors to longitudinal data

The next main step is to use the preferred model types and covariates identified from the preceding analysis of cross-sectional data in longitudinal analysis.

Longitudinal mixed modelling

Prior to undertaking longitudinal mixed modelling, we need to check the appropriateness of the default values for modelling parameters that are stored in X. These include the number of model iterations, and any custom control parameters and priors (by default, empty lists).

procureSlot(X, "b_SpecificParameters@iters_1L_int")
#> [1] 4000

In many cases there will be no need to specify any custom control parameters or priors and using the defaults may speed up execution.

procureSlot(X, "b_SpecificParameters@control_ls")
#> [[1]]
#> list()
procureSlot(X,"b_SpecificParameters@prior_ls")
#> [[1]]
#> list()

However, in this example using the default control parameters would result in warning messages suggesting a change to the adapt_delta control value (default = 0.8). Modifying the adapt_delta control parameter value can address this issue.

X <- renewSlot(X, "b_SpecificParameters@control_ls", new_val_xx = list(adapt_delta = 0.99))
X <- investigate(X, consent_1L_chr = consent_1L_chr)
class(X)
#> [1] "SpecificMixed"
#> attr(,"package")
#> [1] "specific"

The last call to investigate function wrote the models it tests to a sub-directory of the output directory along with plots for each model.

Create shareable outputs

The model objects created by the preceding analysis are not suitable for sharing as they contain duplicates of the source dataset. To create model objects that can be shared (where dataset copies are replaced with fake data) use the authorData method.

X <- authorData(X, consent_1L_chr = consent_1L_chr)

Purge dataset copies

For the purposes of efficient computation, multiple objects containing copies of the source dataset were saved to our output directory during the analysis process. We therefore need to delete all of these copies by supplying “purge_write” to the type_1L_chr argument of the author method.

X <- author(X, consent_1L_chr = consent_1L_chr, type_1L_chr = "purge_write")

A copy of the module X is available for download as the file eq5d_ttu_SpecificMixed.RDS from the “Documentation_0.0” release of the specific package.

7 - Implement a utility mapping study

Using modules from the TTU R package, it is possible to implement a fully reproducible utility mapping study. This tutorial illustrates the main steps using a hypothetical AQoL-6D utility mapping study.

This below section renders a vignette article from the TTU library. You can use the following links to:

Note: This vignette uses fake data - it is for illustrative purposes only and should not be used to inform decision making. This vignette outlines the workflow for developing utility mapping models using longitudinal data. The workflow for developing utility mapping models is broadly similar, with some minor modifications. An example of developing models using cross-sectional data is available at https://doi.org/10.5281/zenodo.8098595 .

Motivation

Health services do not typically collect health utility data from their clients, which makes it more difficult to place an economic values on outcomes attained in these services. One strategy for addressing this gap is to use data from similar samples of patients that contain both health utility and the types of outcome measures that are collected in clinical services. The TTU package provides a toolkit for conducting and reporting a utility mapping (or Transfer to Utility) study.

Implementation

The TTU package contains modules of the ready4 youth mental health economic model that combine and extend model modules for:

  • labeling, validating and summarising youth mental health datasets (from the youthvars package);
  • scoring health utility (from the scorz package);
  • specifying and testing statistical models (from the specific package);
  • generating reproducible analysis reports (from the ready4show package); and
  • sharing data via online data repositories (from the ready4use package).

Additionally, TTU relies on two RMarkdown programs:

Outputs generated by the TTU package are designed to be compatible with health economic models developed with the ready4 framework).

Workflow

Background and citation

The following workflow illustrates (using fake data) the same steps we used in a real world study, a summary of which is available at https://doi.org/10.1101/2021.07.07.21260129). Citation information for that study is:

@article {Hamilton2021.07.07.21260129,
    author = {Hamilton, Matthew P and Gao, Caroline X and Filia, Kate M and Menssink, Jana M and Sharmin, Sonia and Telford, Nic and Herrman, Helen and Hickie, Ian B and Mihalopoulos, Cathrine and Rickwood, Debra J and McGorry, Patrick D and Cotton, Sue M},
    title = {Predicting Quality Adjusted Life Years in young people attending primary mental health services},
    elocation-id = {2021.07.07.21260129},
    year = {2021},
    doi = {10.1101/2021.07.07.21260129},
    publisher = {Cold Spring Harbor Laboratory Press},
    URL = {https://www.medrxiv.org/content/early/2021/07/12/2021.07.07.21260129},
    eprint = {https://www.medrxiv.org/content/early/2021/07/12/2021.07.07.21260129.full.pdf},
    journal = {medRxiv}
}

The program applied in that study, which this workflow closely resembles is available at https://doi.org/10.5281/zenodo.6116077 and can be cited as follows:

@software{hamilton_matthew_2022_6212704,
  author       = {Hamilton, Matthew and
                  Gao, Caroline},
  title        = {{Complete study program to reproduce all steps from 
                   data ingest through to results dissemination for a
                   study to map mental health measures to AQoL-6D
                   health utility}},
  month        = feb,
  year         = 2022,
  note         = {{Matthew Hamilton and Caroline Gao  (2022). 
                   Complete study program to reproduce all steps from
                   data ingest through to results dissemination for a
                   study to map mental health measures to AQoL-6D
                   health utility. Zenodo.
                   https://doi.org/10.5281/zenodo.6116077. Version
                   0.0.9.3}},
  publisher    = {Zenodo},
  version      = {0.0.9.3},
  doi          = {10.5281/zenodo.6212704},
  url          = {https://doi.org/10.5281/zenodo.6212704}
}

Load required packages

We begin by loading our required packages.

By default, methods associated with TTU modules will request your consent before writing files to your machine. This is the safest option. However, as there are many files that need to be written locally for this program to execute, you can overwrite this default by supplying the value “Y” to methods with a consent_1L_chr argument.

consent_1L_chr <- "" # Default value - asks for consent prior to writing each file.

Add dataset metadata

We use the Ready4useDyad and Ready4useRepos modules to retrieve and ingest and to then pair a dataset and its data dictionary.

A <- Ready4useDyad(ds_tb = Ready4useRepos(dv_nm_1L_chr = "fakes", dv_ds_nm_1L_chr = "https://doi.org/10.7910/DVN/HJXYKQ", dv_server_1L_chr = "dataverse.harvard.edu") %>%
                     ingest(fls_to_ingest_chr = c("ymh_clinical_tb"), metadata_1L_lgl = F) %>% youthvars::transform_raw_ds_for_analysis(),
                   dictionary_r3 = Ready4useRepos(dv_nm_1L_chr = "TTU", dv_ds_nm_1L_chr = "https://doi.org/10.7910/DVN/DKDIB0", dv_server_1L_chr = "dataverse.harvard.edu") %>%
                     ingest(fls_to_ingest_chr = c("dictionary_r3"), metadata_1L_lgl = F)) %>%
  renew(type_1L_chr = "label")

We use the YouthvarsSeries module to supply metadata about our longitudinal dataset vignette.

A <- YouthvarsSeries(a_Ready4useDyad = A, id_var_nm_1L_chr = "fkClientID", timepoint_var_nm_1L_chr = "round",
                     timepoint_vals_chr = levels(procureSlot(A, "ds_tb")$round))

Score health utility

We next use the ScorzAqol6Adol module to score adolescent AQoL-6D health utility.

A <- TTUProject(a_ScorzProfile = ScorzAqol6Adol(a_YouthvarsProfile = A))
A <- renew(A, what_1L_chr = "utility") 
#> Joining with `by = join_by(fkClientID, match_var_chr)`

Evaluate candidate models

Over the next few steps we will use modules from the specific package to specify and assess a number of candidate utility mapping models.

Specify modelling parameters

We begin by specifying the parameters we will use in our modelling project. The initial step is to ensure the fields in A for storing parameter values are internally consistent with the data we have entered in the previous steps.

A <- renew(A, what_1L_chr = "parameters")

We next ingest a lookup table of metadata about the variables we plan to explore as candidate predictors. In this case, we are sourcing the lookup table from an online data repository.

A <- renew(A, "use_renew_mthd", fl_nm_1L_chr = "predictors_r3", type_1L_chr = "predictors_lup", 
           y_Ready4useRepos = Ready4useRepos(dv_nm_1L_chr = "TTU", dv_ds_nm_1L_chr = "https://doi.org/10.7910/DVN/DKDIB0", 
                                             dv_server_1L_chr = "dataverse.harvard.edu"),
           what_1L_chr = "parameters")

We can inspect the metadata on candidate predictors that we have just ingested.

exhibit(A, scroll_box_args_ls = list(width = "100%"))

We add additional metadata about variables in our dataset that will be used in exploratory modelling.

A <- renew(A, c(0.03,1), type_1L_chr = "range", what_1L_chr = "parameters") %>%
  renew(c("BADS","GAD7", "K6", "OASIS", "PHQ9", "SCARED"),
        type_1L_chr = "predictors_vars", what_1L_chr = "parameters") %>%
  renew(c("d_sex_birth_s", "d_age",  "d_sexual_ori_s", "d_studying_working", "c_p_diag_s", "c_clinical_staging_s", "SOFAS"),     
        type_1L_chr = "covariates", what_1L_chr = "parameters") %>%
  renew(c("d_age","Gender","d_relation_s", "d_sexual_ori_s" ,"Region", "d_studying_working", "c_p_diag_s", "c_clinical_staging_s","SOFAS"), 
        type_1L_chr = "descriptives", what_1L_chr = "parameters") %>%
  renew("d_interview_date", type_1L_chr = "temporal", what_1L_chr = "parameters")

We record that the data we are working with is fake (this step can be skipped if working with real data).

A <- renew(A, T, type_1L_chr = "is_fake", what_1L_chr = "parameters")

We update A for internal consistency with the values we have previously supplied and create a local workspace to which output files will be written.

A <- renew(A, consent_1L_chr = consent_1L_chr, paths_chr = tempdir(), what_1L_chr = "project")

We now generate tables and charts that describe our dataset. These are saved in a sub-directory of our output data directory, and are available for download. One of the plots is also reproduced here.

A <- author(A, consent_1L_chr = consent_1L_chr, digits_1L_int = 3L, what_1L_chr = "descriptives")

We next compare the performance of different model types. We perform this step using the investigate method. This is the first of several times that we use this method. Each time the method is called A is updated to that the next time the method is called, a different algorithm will be used. The sequence of calls to investigate is therefore important (it should be in the same order as outlined in this example and you should not attempt to repeat a call to investigate to redo a prior step).

A <- investigate(A, consent_1L_chr = consent_1L_chr, depnt_var_max_val_1L_dbl = 0.9999, session_ls = sessionInfo())

The outputs of the previous command are saved into a sub-directory of our output directory. An example of this output is available for download). Once we inspect this output, we can then specify the preferred model types to use from this point onwards.

A <- renew(A, c("GLM_GSN_LOG", "OLS_CLL"), type_1L_chr = "models", what_1L_chr = "results")

Next we assess multiple versions of our preferred model type - one single predictor model for each of our candidate predictors and the same models with candidate covariates added.

A <- investigate(A, consent_1L_chr = consent_1L_chr)

The previous step saved output into a sub-directory of our output directory. Example output is available for download: (single predictor comparisons) and multivariate model comparisons. After reviewing this output, we can specify the covariates we wish to add to the models we will assess from this point forward.

A <- renew(A, "SOFAS", type_1L_chr = "covariates", what_1L_chr = "results")

We can now assess the multivariate models.

A <- investigate(A, consent_1L_chr = consent_1L_chr)

As a result of the previous step, more model objects and plot files have been saved to a sub-directory of our output directory. Examples of this output are available for download here and here. Once we inspect this output we can reformulate the models we finalised in the previous step so that they are suitable for modelling longitudinal change. For our primary analysis, we use a mixed model formulation of the models that we previously selected. A series of large model files are written to the local output data directory.

A <- investigate(A, consent_1L_chr = consent_1L_chr)

For our secondary analyses, we specify alternative combinations of predictors and covariates.

A <- investigate(A, consent_1L_chr = consent_1L_chr,
                 scndry_anlys_params_ls = make_scndry_anlys_params(candidate_predrs_chr = c("SOFAS"),
                                                                   candidate_covar_nms_chr = c("d_sex_birth_s", "d_age", "d_sexual_ori_s", "d_studying_working"),
                                                                   prefd_covars_chr = NA_character_) %>%
                   make_scndry_anlys_params(candidate_predrs_chr = c("SCARED","OASIS","GAD7"),
                                            candidate_covar_nms_chr = c("PHQ9", "SOFAS", "d_sex_birth_s", "d_age", "d_sexual_ori_s", "d_studying_working"),
                                            prefd_covars_chr = "PHQ9"))

Report findings

Create shareable models

The model objects created and saved in our working directory by the preceding steps are not suitable for public dissemination. They are both too large in file size and, more importantly, include copies of our source dataset. We can overcome these limitations by creating shareable versions of the models. Two types of shareable version are created - copies of the original model objects in which fake data overwrites the original source data and summary tables of model coefficients.

A <- author(A, consent_1L_chr = consent_1L_chr, what_1L_chr = "models")

Specify study reporting metadata

We update A so that we can begin use it to render and share reports.

A <- renew(A, what_1L_chr = "reporting")

We add metadata relevant to the reports that we will be generating to these fields. Note that the data we supply to the Ready4useRepos object below must relate to a repository to which we have write permissions (otherwise subsequent steps will fail).

A <- renew(A, ready4show::authors_tb, type_1L_chr = "authors", what_1L_chr = "reporting") %>%
  renew(ready4show::institutes_tb, type_1L_chr = "institutes", what_1L_chr = "reporting") %>%
  renew(c(3L,3L), type_1L_chr = "digits", what_1L_chr = "reporting") %>%
  renew(c("PDF","PDF"), type_1L_chr = "formats", what_1L_chr = "reporting") %>%
  renew("A hypothetical utility mapping study using fake data", type_1L_chr = "title", what_1L_chr = "reporting") %>%
  renew(renew(ready4show_correspondences(), old_nms_chr = c("PHQ9", "GAD7"), new_nms_chr = c("PHQ-9", "GAD-7")), type_1L_chr = "changes", what_1L_chr = "reporting") %>%
  renew(Ready4useRepos(dv_nm_1L_chr = "fakes", dv_ds_nm_1L_chr = "https://doi.org/10.7910/DVN/D74QMP", dv_server_1L_chr = "dataverse.harvard.edu"), type_1L_chr = "repos", what_1L_chr = "reporting") 

Author model catalogues

We download a program for generating a catalogue of models and use it to summarising the models created under each study analysis (one primary and two secondary). The catalogues are saved locally.

A <- author(A, consent_1L_chr = consent_1L_chr, download_tmpl_1L_lgl = T, what_1L_chr = "catalogue")

Author manuscript

We add some content about the manuscript we wish to author.

A <- renew(A, "Quality Adjusted Life Years (QALYs) are often used in economic evaluations, yet utility weights for deriving them are rarely directly measured in mental health services.", 
           type_1L_chr = "background", what_1L_chr = "reporting") %>%
  renew("None declared", type_1L_chr = "conflicts", what_1L_chr = "reporting") %>%
  renew("Nothing should be concluded from this study as it is purely hypothetical.", type_1L_chr = "conclusion", what_1L_chr = "reporting") %>%
  renew("The study was reviewed and granted approval by no-one." , type_1L_chr = "ethics", what_1L_chr = "reporting") %>%
  renew("The study was funded by no-one.", type_1L_chr = "funding", what_1L_chr = "reporting") %>%
  renew("three months", type_1L_chr = "interval", what_1L_chr = "reporting") %>%
  renew(c("anxiety", "AQoL","depression", "psychological distress", "QALYs", "utility mapping"), type_1L_chr = "keywords", what_1L_chr = "reporting") %>%
  renew("The study sample is fake data.", type_1L_chr = "sample", what_1L_chr = "reporting") 

We create a brief summary of results that can be interpreted by the program that authors the manuscript.

A <- renew(A, c("AQoL-6D", "Adolescent AQoL Six Dimension"), type_1L_chr = "naming", what_1L_chr = "reporting")
A <- renew(A, "use_renew_mthd", type_1L_chr = "abstract", what_1L_chr = "reporting")

We create and save the plots that will be used in the manuscript.

A <- author(A, consent_1L_chr = consent_1L_chr, what_1L_chr = "plots")

We download a program for generating a template manuscript and run it to author a first draft of the manuscript.

A <- author(A, consent_1L_chr = consent_1L_chr, download_tmpl_1L_lgl = T, what_1L_chr = "manuscript")

We can copy the RMarkdown files that created the template manuscript to a new directory (called “Manuscript_Submission”) so that we can then manually edit those files to produce a manuscript that we can submit for publication.

A <- author(A, consent_1L_chr = consent_1L_chr, type_1L_chr = "copy", what_1L_chr = "manuscript")

At this point in the workflow, additional steps are required to adapt / author the manuscript that will be submitted for publication. However, in this example we are going to skip that step and keep working with the unedited template manuscript. If we had a finalised manuscript authoring program stored online, we could now specify the repository from which the program can be retrieved.

# Not run
# A <- renew(A, c("URL of GitHub repository with", "Program version number"), type_1L_chr = "template-manuscript", what_1L_chr = "reporting")

We can now configure the output to be generated by the manuscript authoring program. The below commands will specify a Microsoft Word format manuscript and a PDF technical appendix. Unlike the template manuscript, the figures and tables will be positioned after (and not within) the main body of the manuscript. Note that the Word version of the manuscript generated by these values will require some minor formatting edits (principally to the display of tables and numbering of sections).

A <- renew(A, F, type_1L_chr = "figures-body", what_1L_chr = "reporting") %>%
  renew(F, type_1L_chr = "tables-body", what_1L_chr = "reporting") %>%
  renew(c("Word","PDF"), type_1L_chr = "formats", what_1L_chr = "reporting")

Once any edits to the RMarkdown files for creating the submission manuscript have been finalised, we can run the following command to author the manuscript. If we are using a custom manuscript authoring program downloaded from an online repository the download_tmpl_1L_lgl argument will need to be set to T.

A <- author(A, consent_1L_chr = consent_1L_chr, download_tmpl_1L_lgl = F, type_1L_chr="submission", what_1L_chr = "manuscript")

We can now generate the Supplementary Information for the submission manuscript.

A <- author(A, consent_1L_chr = consent_1L_chr, supplement_fl_nm_1L_chr = "TA_PDF", type_1L_chr="submission", what_1L_chr = "supplement")

Share outputs

We can now share non-confidential elements (ie no copies of individual records) of the outputs that we have created via our study online repository. To run this step you will need write permissions to the online repository. In the below step we are sharing model catalogues, details of the utility instrument, the shareable mapping models (designed to be used in conjunction with the youthu package), our manuscript files and our supplementary information. In most real world studies the manuscript would not be shared via an online repository - the what_chr argument would need to be ammended to reflect this.

A <- share(A, types_chr = c("auto", "submission"), what_chr = c("catalogue", "instrument" ,"manuscript", "models", "supplement"))

The dataset we created in the previous step is viewable here: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/D74QMP

Tidy workspace

The preceding steps saved multiple objects (mostly R model objects) that have embedded within them copies of the source dataset. To protect the confidentiality of these records we can now purge all such copies from our output data directory.

A <- author(A, what_1L_chr = "purge")

8 - Find and deploy utility mapping models

Using tools (soon to be formalised into ready4 modules) from the youthu R package, it is possible to find and deploy relevant utility mapping algorithms.

8.1 - Example 1: Predict health utility from psychological and functional measures (PHQ-9 and SOFAS)

This tutorial illustrates the main steps for predicting AQoL-6D utility from psychological and functional measures using a longitudinal dataset in long format.

This below section renders a vignette article from the youthu library. You can use the following links to:

This vignette outlines a workflow for:

  • Searching, selecting and retrieving transfer to utility models;
  • Preparing a prediction dataset for use with a selected transfer to utility model; and
  • Applying the selected transfer to utility model to a prediction dataset to predict Quality Adjusted Life Years (QALYs).

The practical value of implementing such a workflow is discussed in the economic analysis vignette and a scientific manuscript. Note, this example uses fake data - it should should not be used to inform decision making.

Search, select and retrieve transfer to utility models

To identify datasets that contain transfer to utility models compatible with youthu (ie those developped with the TTU package), you can use the get_ttu_dv_dss function. The function searches specified dataverses (in the below example, the TTU dataverse) for datasets containing output from the TTU package.

ttu_dv_dss_tb <- get_ttu_dv_dss("TTU")

The ttu_dv_dss_tb table summarises some pertinent details about each dataset containing TTU models found by the preceding command. These details include a link to any scientific summary (the “Article” column) associated with a dataset.

Transfer to Utility Datasets
ID Utility Predictors Article
1 aqol6dtotalw BADS total score , GAD7 total score , K6 total score , OASIS total score , PHQ9 total score , SCARED total score, SOFAS total score

To identify models that predict a specified type of health utility from one or more of a specified subset of predictors, use:

mdls_lup <- get_mdls_lup(ttu_dv_dss_tb = ttu_dv_dss_tb,
                         utility_type_chr = "AQoL-6D",
                         mdl_predrs_in_ds_chr = c("PHQ9 total score",
                                                  "SOFAS total score"))

The preceding command will produce a lookup table with information that includes the catalogue names of models, the predictors used in each model and the analysis that generated each one.

Selected elements from Models Look-Up Table
Catalogue reference Predictors Analysis
PHQ9_1_GLM_GSN_LOG PHQ9 Primary Analysis
PHQ9_1_OLS_CLL PHQ9 Primary Analysis
PHQ9_SOFAS_1_GLM_GSN_LOG PHQ9 , SOFAS Primary Analysis
PHQ9_SOFAS_1_OLS_CLL PHQ9 , SOFAS Primary Analysis
OASIS_SOFAS_1_GLM_GSN_LOG OASIS, SOFAS Primary Analysis
OASIS_SOFAS_1_OLS_CLL OASIS, SOFAS Primary Analysis
BADS_SOFAS_1_GLM_GSN_LOG BADS , SOFAS Primary Analysis
BADS_SOFAS_1_OLS_CLL BADS , SOFAS Primary Analysis
K6_SOFAS_1_GLM_GSN_LOG K6 , SOFAS Primary Analysis
K6_SOFAS_1_OLS_CLL K6 , SOFAS Primary Analysis
SCARED_SOFAS_1_GLM_GSN_LOG SCARED, SOFAS Primary Analysis
SCARED_SOFAS_1_OLS_CLL SCARED, SOFAS Primary Analysis
GAD7_SOFAS_1_GLM_GSN_LOG GAD7 , SOFAS Primary Analysis
GAD7_SOFAS_1_OLS_CLL GAD7 , SOFAS Primary Analysis
SOFAS_1_GLM_GSN_LOG SOFAS Secondary Analysis A
SOFAS_1_OLS_CLL SOFAS Secondary Analysis A
OASIS_PHQ9_1_GLM_GSN_LOG OASIS, PHQ9 Secondary Analysis B
OASIS_PHQ9_1_OLS_CLL OASIS, PHQ9 Secondary Analysis B
GAD7_PHQ9_1_GLM_GSN_LOG GAD7, PHQ9 Secondary Analysis B
GAD7_PHQ9_1_OLS_CLL GAD7, PHQ9 Secondary Analysis B
SCARED_PHQ9_1_GLM_GSN_LOG SCARED, PHQ9 Secondary Analysis B
SCARED_PHQ9_1_OLS_CLL SCARED, PHQ9 Secondary Analysis B

To review the summary information about the predictive performance of a specific model, use:

get_dv_mdl_smrys(mdls_lup,
                 mdl_nms_chr = "PHQ9_SOFAS_1_OLS_CLL")
#> $PHQ9_SOFAS_1_OLS_CLL
#>        Parameter Estimate    SE          95% CI
#> 1 SD (Intercept)    0.348 0.017   0.312 , 0.382
#> 2      Intercept    0.428 0.129   0.174 , 0.686
#> 3  PHQ9 baseline   -9.115 0.249 -9.601 , -8.618
#> 4    PHQ9 change   -7.331 0.339 -8.007 , -6.665
#> 5 SOFAS baseline    0.960 0.172   0.616 , 1.292
#> 6   SOFAS change    1.146 0.235   0.674 , 1.607
#> 7             R2    0.767 0.012   0.743 , 0.788
#> 8           RMSE    0.925 0.004   0.922 , 0.928
#> 9          Sigma    0.406 0.012   0.384 , 0.429

More information about a selected model can be found in the online model catalogue, the link to which can be obtained with the following command:

get_mdl_ctlg_url(mdls_lup,
                 mdl_nm_1L_chr = "PHQ9_SOFAS_1_OLS_CLL")

[1] “https://dataverse.harvard.edu/api/access/datafile/6484935

Prepare a prediction dataset for use with a selected transfer to utility model

Import data

You can now import and inspect the dataset you plan on using for prediction. In the below example we use fake data.

data_tb <- make_fake_ds_one()
Illustrative example of a prediction dataset
UID Timepoint Date PHQ_total SOFAS_total
Participant_1 Baseline 2022-05-22 7 69
Participant_10 Baseline 2022-04-07 17 60
Participant_10 Follow-up 2022-06-22 17 64
Participant_100 Baseline 2022-07-29 0 76
Participant_1000 Baseline 2022-02-10 0 71
Participant_1000 Follow-up 2022-05-05 0 71

Confirm dataset can be used as a prediction dataset

The prediction dataset must contain variables that correspond to all the predictors of the model you intend to apply. The allowable range and required class of each predictor variable are described in the min_val_dbl, max_val_dbl and class_chr columns of the model predictors lookup table, which can be accessed with a call to the get_predictors_lup function.

predictors_lup <- get_predictors_lup(mdls_lup = mdls_lup,
                                     mdl_nm_1L_chr = "PHQ9_SOFAS_1_OLS_CLL")
Model predictors lookup table
short_name_chr long_name_chr min_val_dbl max_val_dbl class_chr increment_dbl class_fn_chr mdl_scaling_dbl covariate_lgl
PHQ9 PHQ9 total score 0 27 integer 1 youthvars::youthvars_phq9 0.01 FALSE
SOFAS SOFAS total score 0 100 integer 1 youthvars::youthvars_sofas 0.01 TRUE

The prediction dataset must also include both a unique client identifier variable and a measurement time-point identifier variable (which must be a factor with two levels). The dataset also needs to be in long format (ie where measures at different time-points for the same individual are stacked on top of each other in separate rows). We can confirm these conditions hold by creating a dataset metadata object using the make_predn_metadata_ls function. In creating the metadata object, the function checks that the dataset can be used in conjunction with the model specified at the mdl_nm_1L_chr argument. If the prediction dataset uses different variable names for the predictors to those specified in the predictors_lup lookup table, a named vector detailing the correspondence between the two sets of variable names needs to be passed to the predr_vars_nms_chr argument. Finally, if you wish to specify a preferred variable name to use for the predicted utility values when applying the model, you can do this by passing this name to the utl_var_nm_1L_chr argument.

predn_ds_ls <- make_predn_metadata_ls(data_tb,
                                      id_var_nm_1L_chr = "UID",
                                      msrmnt_date_var_nm_1L_chr = "Date",
                                      predr_vars_nms_chr = c(PHQ9 = "PHQ_total",SOFAS = "SOFAS_total"),
                                      round_var_nm_1L_chr = "Timepoint",
                                      round_bl_val_1L_chr = "Baseline",
                                      utl_var_nm_1L_chr = "AQoL6D_HU",
                                      mdls_lup = mdls_lup,
                                      mdl_nm_1L_chr = "PHQ9_SOFAS_1_OLS_CLL")

Apply the selected transfer to utility model to a prediction dataset to predict Quality Adjusted Life Years (QALYs)

Predict health utility at baseline and follow-up timepoints

To generate utility predictions we use the add_utl_predn function. The function needs to be supplied with the prediction dataset (the value passed to argument data_tb) and the validated prediction metadata object we created in the previous step.

data_tb <- add_utl_predn(data_tb,
                         predn_ds_ls = predn_ds_ls)
#> Joining with `by = join_by(UID, Timepoint)`

By default the add_utl_predn function samples model parameter values based on a table of model coefficients when making predictions and constrains predictions to an allowed range. You can override these defaults by adding additional arguments new_data_is_1L_chr = "Predicted" (which uses mean parameter values), force_min_max_1L_lgl = F (removes range constraint) and (if the source dataset makes available downloadable model objects) make_from_tbl_1L_lgl = F. These settings will produce different predictions. It is strongly recommended that you consult the model catalogue (see above) to understand how such decisions may affect the validity of the predicted values that will be generated.

Prediction dataset with predicted utilities
UID Timepoint Date PHQ_total SOFAS_total AQoL6D_HU
Participant_1 Baseline 2022-05-22 7 69 0.7588738
Participant_10 Baseline 2022-04-07 17 60 0.7074180
Participant_10 Follow-up 2022-06-22 17 64 0.3757341
Participant_100 Baseline 2022-07-29 0 76 0.6393778
Participant_1000 Baseline 2022-02-10 0 71 0.9297959
Participant_1000 Follow-up 2022-05-05 0 71 0.7712380

Our health utility predictions are now available for use and are summarised below.

summary(data_tb$AQoL6D_HU)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#> 0.05329 0.43310 0.63513 0.62558 0.83475 1.00000

Calculate QALYs

The last step is to calculate Quality Adjusted Life Years, using a method assuming a linear rate of change between timepoints.

data_tb <- data_tb %>% add_qalys_to_ds(predn_ds_ls = predn_ds_ls,
                                       include_predrs_1L_lgl = F,
                                       reshape_1L_lgl = F)
Prediction dataset with QALYs
UID Timepoint Date PHQ_total SOFAS_total AQoL6D_HU AQoL6D_HU_change_dbl duration_prd qalys_dbl
Participant_1 Baseline 2022-05-22 7 69 0.7588738 0.0000000 0S 0.0000000
Participant_10 Baseline 2022-04-07 17 60 0.7074180 0.0000000 0S 0.0000000
Participant_10 Follow-up 2022-06-22 17 64 0.3757341 -0.3316839 76d 0H 0M 0S 0.1126893
Participant_100 Baseline 2022-07-29 0 76 0.6393778 0.0000000 0S 0.0000000
Participant_1000 Baseline 2022-02-10 0 71 0.9297959 0.0000000 0S 0.0000000
Participant_1000 Follow-up 2022-05-05 0 71 0.7712380 -0.1585579 84d 0H 0M 0S 0.1956014

8.2 - Example 2: Predict health utility from psychological measures (PHQ-9 and GAD-7)

This tutorial illustrates the main steps for predicting AQoL-6D utility from two psychological measures using a longitudinal dataset in wide format.

This below section renders a vignette article from the youthu library. You can use the following links to:

This vignette article is abridged and modified version of another article on predicting Quality Adjusted Life Years with youthu.

Motivation

This article illustrates how to make QALY predictions using a dataset in wide format with no health-utility measures but containing two psychological measures (GAD-7 and PHQ-9).

Install youthu

If not already installed it will be necessary to install the youthu R library. As youthu is not yet available on CRAN, it will be necessary to install it directly from its GitHub repository using an R package like remotes or devtools.

# Uncomment and run if installation is required.
# utils::install.packages("devtools") 
# devtools::install_github("ready4-dev/youthu")

Load required packages

We now load the libraries we will be using in subsequent steps. Note, both the ready4, ready4show and ready4use ready4 framework libraries will have been installed automatically when youthu was installed. The specific readyforwhatsnext module library and dplyr, purrr, stringr and tidyr CRAN libraries will have been installed at the same time.

Specify data sources

We begin by specifying the sources for our data. In this example, our data sources are online repositories.

X <- Ready4useRepos(dv_nm_1L_chr = "fakes", dv_ds_nm_1L_chr = "https://doi.org/10.7910/DVN/HJXYKQ", 
                    dv_server_1L_chr = "dataverse.harvard.edu",
                    gh_repo_1L_chr = "ready4-dev/youthu", gh_tag_1L_chr = "v0.0.0.91125")

Inspect dataset

We can now inspect the dataset we will be using to make predictions. As this is a demonstration article we are going to create a custom synthetic dataset. Our first step in doing so is to ingest a preexisting synthetic dataset (in long format) using the method explained in another vignette article

data_tb <- ingest(X, fls_to_ingest_chr = c("ymh_phq_gad_tb"), metadata_1L_lgl = F)

Our resulting dataset has unique IDs for each participant (character class), timestamps for each data collection timepoint (Date class variables) and GAD-7 and PHQ-9 scores for each timepoint (integer class).

data_tb %>% head() %>% ready4show::print_table(caption_1L_chr = "Dataset", output_type_1L_chr = "HTML") 
Dataset
fkClientID d_interview_date_t1 d_interview_date_t2 gad7_t1 gad7_t2 phq9_t1 phq9_t2
Participant_1 2020-03-22 NA 6 NA 7 NA
Participant_2 2020-06-15 NA 12 NA 13 NA
Participant_3 2020-08-20 NA 16 NA 17 NA
Participant_4 2020-05-23 2020-08-19 12 12 17 14
Participant_5 2020-04-05 2020-07-19 14 6 22 8
Participant_6 2020-06-09 NA 8 NA 8 NA

Get mapping models

We retrieve details of relevant AQoL-6D mapping models for wither of the predictors we plan on using. How these models were derived is described in a pre-print and details of model performance is included in catalogues available in an open access data repository.

mdls_lup <- get_mdls_lup(ttu_dv_dss_tb = get_ttu_dv_dss("TTU"),
                         utility_type_chr = "AQoL-6D",
                         mdl_predrs_in_ds_chr = c("GAD7 total score", "PHQ9 total score"))
mdls_lup[,c(1,2,5)] %>% 
  ready4show::print_table(caption_1L_chr = "Available models", output_type_1L_chr = "HTML") 
Available models
mdl_nms_chr predrs_ls source_chr
PHQ9_1_GLM_GSN_LOG PHQ9 Primary Analysis
PHQ9_1_OLS_CLL PHQ9 Primary Analysis
GAD7_1_GLM_GSN_LOG GAD7 Primary Analysis
GAD7_1_OLS_CLL GAD7 Primary Analysis
PHQ9_SOFAS_1_GLM_GSN_LOG PHQ9 , SOFAS Primary Analysis
PHQ9_SOFAS_1_OLS_CLL PHQ9 , SOFAS Primary Analysis
GAD7_SOFAS_1_GLM_GSN_LOG GAD7 , SOFAS Primary Analysis
GAD7_SOFAS_1_OLS_CLL GAD7 , SOFAS Primary Analysis
OASIS_PHQ9_1_GLM_GSN_LOG OASIS, PHQ9 Secondary Analysis B
OASIS_PHQ9_1_OLS_CLL OASIS, PHQ9 Secondary Analysis B
GAD7_PHQ9_1_GLM_GSN_LOG GAD7, PHQ9 Secondary Analysis B
GAD7_PHQ9_1_OLS_CLL GAD7, PHQ9 Secondary Analysis B
SCARED_PHQ9_1_GLM_GSN_LOG SCARED, PHQ9 Secondary Analysis B
SCARED_PHQ9_1_OLS_CLL SCARED, PHQ9 Secondary Analysis B

We select our preferred model and retrieve summary data about the model’s predictor variables.

predictors_lup <- get_predictors_lup(mdls_lup = mdls_lup, mdl_nm_1L_chr = "GAD7_PHQ9_1_OLS_CLL")
exhibit(predictors_lup)
Variable Description Minimum Maximum Class Increment Function Scaling Covariate
GAD7 GAD7 total score 0 21 integer 1 youthvars::youthvars_gad7 0.01 FALSE
PHQ9 PHQ9 total score 0 27 integer 1 youthvars::youthvars_phq9 0.01 FALSE

Transform prediction dataset

To be used with the mapping models available to us, our prediction dataset needs to be in long format. We perform the necessary transformation.

data_tb <- transform_ds_to_long(data_tb, predictors_chr = c("gad7", "phq9"),
                             msrmnt_date_var_nm_1L_chr = "d_interview_date", round_var_nm_1L_chr = "When")
#> Joining with `by = join_by(case_id, fkClientID, When)`
#> Joining with `by = join_by(case_id, fkClientID, When)`

We drop records where we are missing data for either GAD7 or PHQ9 at either timepoint.

data_tb <- transform_ds_to_drop_msng(data_tb, predictors_chr = c("gad7", "phq9"), 
                                      uid_var_nm_1L_chr = "fkClientID")

We now predict AQoL-6D health utility for each case with complete data.

predn_ds_ls <- make_predn_metadata_ls(data_tb,
                                      id_var_nm_1L_chr = "fkClientID",
                                      msrmnt_date_var_nm_1L_chr = "d_interview_date",
                                      predr_vars_nms_chr = c(GAD7 = "gad7", PHQ9 = "phq9"),
                                      round_var_nm_1L_chr = "When",
                                      round_bl_val_1L_chr = "t1",
                                      utl_var_nm_1L_chr = "AQoL6D_HU",
                                      mdls_lup = mdls_lup,
                                      mdl_nm_1L_chr = "GAD7_PHQ9_1_OLS_CLL")
data_tb <- add_utl_predn(data_tb, new_data_is_1L_chr = "Predicted", predn_ds_ls = predn_ds_ls)
#> Joining with `by = join_by(fkClientID, When)`

Finally, we derive QALY predictions from the health utility measures at both time-points.

data_tb <- data_tb %>% add_qalys_to_ds(predn_ds_ls = predn_ds_ls, include_predrs_1L_lgl = F, reshape_1L_lgl = T)
data_tb %>% head() %>%
  ready4show::print_table(caption_1L_chr = "Final dataset", output_type_1L_chr = "HTML",
                          scroll_box_args_ls = list(width = "100%"))
Final dataset
fkClientID d_interview_date_t1 d_interview_date_t2 gad7_t1 gad7_t2 phq9_t1 phq9_t2 AQoL6D_HU_t1 AQoL6D_HU_t2 AQoL6D_HU_change_dbl_t1 AQoL6D_HU_change_dbl_t2 duration_prd_t1 duration_prd_t2 qalys_dbl_t1 qalys_dbl_t2
Participant_10 2020-08-05 2020-11-07 15 13 17 18 0.2864522 0.2719422 0 -0.0145101 0S 94d 0H 0M 0S 0 0.0718536
Participant_1000 2020-09-06 2020-12-20 13 10 13 10 0.4773453 0.7036286 0 0.2262833 0S 105d 0H 0M 0S 0 0.1697498
Participant_1001 2020-07-05 2020-10-15 10 11 10 16 0.9191706 0.4230515 0 -0.4961191 0S 102d 0H 0M 0S 0 0.1874150
Participant_1003 2020-05-18 2020-08-12 6 8 16 7 0.5828339 0.5727665 0 -0.0100674 0S 86d 0H 0M 0S 0 0.1360460
Participant_1005 2020-05-09 2020-08-25 14 5 20 9 0.3093288 0.7676893 0 0.4583605 0S 108d 0H 0M 0S 0 0.1592306
Participant_1006 2020-05-29 2020-08-25 15 9 21 17 0.2440057 0.5715385 0 0.3275328 0S 88d 0H 0M 0S 0 0.0982449

9 - Use utility mapping algorithms to help implement cost-utility analyses

Using tools (soon to be formalised into ready4 framework modules) from the youthu R package, it is possible to use utility mapping algorithms to help implement cost-utility analyses. This tutorial illustrates the main steps for doing so using psychological and functional measures collected on clinical samples of young people.

This below section renders a vignette article from the youthu library. You can use the following links to:

This vignette illustrates the rationale for and practical decision-making utility of youthu’s QALYs prediction workflow. Note, this example is illustrated with fake data and should not be used to inform decision-making.

Motivation

The main motivation behind the youthu package is to extend the types of economic analysis that can be undertaken with both single group (e.g. pilot study, health service records) and matched groups (e.g. trial) longitudinal datasets that do not include measures of health utility. This article focuses on its application to matched group datasets.

Example dataset

First, we must first import our data. In this example we will use a fake dataset.

ds_tb <- make_fake_ds_two()
#> Joining with `by = join_by(fkClientID, study_arm_chr)`

Our dataset includes 268 matched comparisons, with each comparison containing baseline and follow-up records for one intervention arm participant and one control arm participant. The first few records are as follows.

First few records from input dataset
fkClientID round date_psx duration_prd PHQ9 SOFAS costs_dbl study_arm_chr match_idx_int
Participant_20 Baseline 2023-07-04 0S 16 41 301.1868 Intervention 1
Participant_593 Baseline 2023-05-11 0S 19 43 259.3190 Control 1
Participant_593 Follow-up 2023-11-02 175d 0H 0M 0S 16 65 1290.4220 Control 1
Participant_20 Follow-up 2023-12-29 178d 0H 0M 0S 15 74 1787.4242 Intervention 1
Participant_259 Baseline 2023-08-29 0S 19 39 311.0018 Control 2
Participant_962 Baseline 2023-10-11 0S 10 45 276.2181 Intervention 2

This dataset contains features that make it possible to use in conjunction with youthu’s economic analysis functions. These requirements are described in the vignette about finding and using models compatible models to predict QALYs;

The dataset also contains a cost variable, which is a requirement for most, though not all, of the economic analyses that can be undertaken with youthu.

Limitations of datasets without measures of health utility

A notable omission from the dataset is any measure of utility. This omission means that, in the absence of using mapping algorithms such as those included with youthu, the most feasible types of economic evaluation to apply to this dataset would likely be cost-consequence analysis (where a synopsis of the differences in a range of measures are presented alongside cost differences) and cost-effectiveness analysis (where a summary statistic - the incremental cost-effectiveness ratio or ICER - is calculated by dividing differences in costs by differences in a single outcome measure).

These types of economic analyses can be relatively simple to interpret if either the intervention or control arm is simultaneously cheaper and more effective across all included outcome measures. However, these conditions don’t hold in our sample data.

summary((ds_tb %>% dplyr::filter(study_arm_chr == "Control" & round == "Baseline"))[5:6])
#>       PHQ9          SOFAS      
#>  Min.   : 0.0   Min.   :39.00  
#>  1st Qu.: 7.0   1st Qu.:60.00  
#>  Median :12.0   Median :66.00  
#>  Mean   :10.9   Mean   :66.13  
#>  3rd Qu.:15.0   3rd Qu.:72.00  
#>  Max.   :19.0   Max.   :89.00
summary((ds_tb %>% dplyr::filter(study_arm_chr == "Control" & round == "Follow-up"))[5:7])
#>       PHQ9            SOFAS         costs_dbl     
#>  Min.   : 0.000   Min.   :39.00   Min.   : 889.9  
#>  1st Qu.: 4.000   1st Qu.:64.00   1st Qu.:1321.1  
#>  Median : 8.000   Median :71.00   Median :1486.7  
#>  Mean   : 8.493   Mean   :70.65   Mean   :1489.0  
#>  3rd Qu.:13.000   3rd Qu.:77.00   3rd Qu.:1627.0  
#>  Max.   :27.000   Max.   :98.00   Max.   :2216.5
summary((ds_tb %>% dplyr::filter(study_arm_chr == "Intervention" & round == "Baseline"))[5:6])
#>       PHQ9           SOFAS      
#>  Min.   : 0.00   Min.   :36.00  
#>  1st Qu.: 7.00   1st Qu.:61.00  
#>  Median :11.00   Median :67.00  
#>  Mean   :10.81   Mean   :66.74  
#>  3rd Qu.:15.00   3rd Qu.:72.25  
#>  Max.   :19.00   Max.   :88.00
summary((ds_tb %>% dplyr::filter(study_arm_chr == "Intervention" & round == "Follow-up"))[5:7])
#>       PHQ9            SOFAS      costs_dbl     
#>  Min.   : 0.000   Min.   :40   Min.   : 923.4  
#>  1st Qu.: 2.000   1st Qu.:60   1st Qu.:1625.6  
#>  Median : 6.500   Median :68   Median :1777.3  
#>  Mean   : 6.851   Mean   :68   Mean   :1807.8  
#>  3rd Qu.:11.000   3rd Qu.:77   3rd Qu.:1996.0  
#>  Max.   :25.000   Max.   :93   Max.   :2872.7

The pattern of results summarised above create some significant barriers to meaningfully interpreting economic evaluations that are based on cost-consequence or cost-effectiveness analysis:

  • A cost-effectiveness analysis in which change in PHQ-9 was the benefit measure would be difficult to interpret as the Intervention arm is both more effective and more costly, which begs the question is it worth paying the extra dollars for this improvement? Also - would a judgment of cost-effectiveness remain the same if the study had measured a slightly different incremental benefit or recorded change over a longer or shorter time horizon? It is likely that there is no commonly used value for money benchmark for improvements measured in PHQ-9, nor is there any time weighting associated with the measure. Furthermore, if the potential funding for the intervention is from a budget that is allocated to non-depressive illnesses (e.g. physical health), results from a cost-effectiveness analysis using PHQ-9 as its benefit measure are not readily comparable with economic evaluations of interventions from other illness groups using different benefit measures that are potentially competing for the same scarce funding.

  • A cost consequence analyses that summarised the differences in costs with the differences in changes in PHQ-9 and SOFAS score would be difficult to interpret because while the intervention is more effective than control for improvements measured on PHQ-9 (where lower scores are better), the control group is superior if benefits are based on functioning improvements as measured by SOFAS scores (where higher scores are better). The lack of any formal weighting for how to trade off clinical symptoms and functioning means that interpretation of this analysis will be highly subjective and likely to change across potential decision makers.

These types of short-comings can be significantly addressed by undertaking cost-utility analyses (CUAs) as:

  • they use a measure of benefit - the Quality Adjusted Life Year (QALY) - that captures multiple domains of health, weighted by time and population preferences in a single index measure that can be applied across health conditions;
  • there are published benchmark willingness to pay values for QALYs that are routinely used by decision makers in many countries to make ICER statistics readily interpretable in the context of health budget allocation.

The rest of this article demonstrates how youthu functions can be used to undertake CUA based analyses on the type of data we have just profiled.

Using youthu in a cost-utility analysis workflow

Predict adolescent AQoL-6D health utility

Our first step is to identify which youthu models we will use to predict adolescent AQoL-6D and apply these models to our data. This step was explained in more detail in another vignette article about finding and using transfer to utility models, so will be dealt with briefly here.

We ingest metadata about the mapping models we plan to use. NOTE: This is a temporary step that is required due to the metadata file not being in the study online repository. This code will cease to work once the metadata file has been moved from its temporary location to the study dataset. We will perform this task when an associated manuscript exits its current review process.

mdl_meta_data_ls <- ingest(Ready4useRepos(gh_repo_1L_chr = "ready4-dev/youthu", gh_tag_1L_chr = "v0.0.0.91125"), fls_to_ingest_chr = c("mdl_meta_data_ls"), metadata_1L_lgl = F)

We now make sure that our dataset can be used as a prediction dataset in conjunction with the model we intend using.

predn_ds_ls <- make_predn_metadata_ls(ds_tb,
                                      cmprsn_groups_chr = c("Intervention", "Control"),
                                      cmprsn_var_nm_1L_chr = "study_arm_chr",
                                      costs_var_nm_1L_chr = "costs_dbl",
                                      id_var_nm_1L_chr = "fkClientID",
                                      mdl_meta_data_ls = mdl_meta_data_ls,
                                      msrmnt_date_var_nm_1L_chr = "date_psx",
                                      round_var_nm_1L_chr = "round",
                                      round_bl_val_1L_chr = "Baseline",
                                      utl_var_nm_1L_chr = "AQoL6D_HU",
                                      mdls_lup = get_mdls_lup(utility_type_chr = "AQoL-6D",
                                                              mdl_predrs_in_ds_chr = c("PHQ9 total score",
                                                                                       "SOFAS total score"),
                                                              ttu_dv_nms_chr = "TTU"),
                                      mdl_nm_1L_chr =  "PHQ9_SOFAS_1_OLS_CLL")

We now use our preferred model to predict health utility from the measures in our dataset.

ds_tb <- add_utl_predn(ds_tb,
                       predn_ds_ls = predn_ds_ls) %>%
  dplyr::select(fkClientID, round, study_arm_chr, date_psx, duration_prd, dplyr::everything())
#> Joining with `by = join_by(fkClientID, round)`

Calculate QALYs

Next we combine the health utility data with the interval between measurement data to calculate QALYs and add them to the dataset.

ds_tb  <- ds_tb %>% add_qalys_to_ds(predn_ds_ls = predn_ds_ls,
                                    include_predrs_1L_lgl = T,
                                    reshape_1L_lgl = T)
First few records from updated dataset with QALYs
fkClientID study_arm_chr match_idx_int date_psx_Baseline date_psx_Follow-up duration_prd_Baseline duration_prd_Follow-up costs_dbl_Baseline costs_dbl_Follow-up PHQ9_Baseline PHQ9_Follow-up SOFAS_Baseline SOFAS_Follow-up AQoL6D_HU_Baseline AQoL6D_HU_Follow-up PHQ9_change_dbl_Baseline PHQ9_change_dbl_Follow-up SOFAS_change_dbl_Baseline SOFAS_change_dbl_Follow-up AQoL6D_HU_change_dbl_Baseline AQoL6D_HU_change_dbl_Follow-up qalys_dbl_Baseline qalys_dbl_Follow-up
Participant_10 Control 243 2023-04-19 2023-10-13 0S 177d 0H 0M 0S 647.9386 1696.235 8 10 61 64 0.7597988 0.6079774 0 2 0 3 0 -0.1518214 0 0.3314119
Participant_1000 Control 191 2023-06-15 2023-12-16 0S 184d 0H 0M 0S 428.9205 1619.037 4 2 63 82 0.8459579 0.7688131 0 -2 0 19 0 -0.0771448 0 0.4067322
Participant_1001 Intervention 230 2023-05-10 2023-11-05 0S 179d 0H 0M 0S 429.3703 1844.219 10 14 59 72 0.6138300 0.8607305 0 4 0 13 0 0.2469005 0 0.3613228
Participant_1003 Intervention 115 2023-06-08 2023-12-07 0S 182d 0H 0M 0S 395.1637 1537.365 9 0 71 81 0.5808015 0.9315788 0 -9 0 10 0 0.3507773 0 0.3768011
Participant_1005 Intervention 183 2023-09-09 2024-03-13 0S 186d 0H 0M 0S 402.9910 1826.511 17 0 78 88 0.5460607 0.9593811 0 -17 0 10 0 0.4133204 0 0.3833158
Participant_1006 Intervention 219 2023-10-05 2024-04-01 0S 179d 0H 0M 0S 534.2285 2401.478 9 14 75 73 0.7239490 0.5885972 0 5 0 -2 0 -0.1353518 0 0.3216232

Analyse results

Now we can run the main economic analysis. This is implemented by the make_hlth_ec_smry function, which first bootstraps the dataset (implemented by the boot function from the boot package) before passing the mean values for costs and QALYs from each bootstrap sample to with bcea function of the BCEA package to calculate a range of health economic statistics. For this example we pass a value of 50,000 for the willingness to pay parameter, as this is the dollar amount commonly used in Australia as a benchmark for the value of a QALY.

Note, for this illustrative example we only request 1000 bootstrap iterations - in practice this number may be higher.

he_smry_ls <- ds_tb %>% make_hlth_ec_smry(predn_ds_ls = predn_ds_ls,
                                                 wtp_dbl = 50000,
                                                 bootstrap_iters_1L_int = 1000L)
#> Warning: There was 1 warning in `dplyr::summarise()`.
#>  In argument: `dplyr::across(.fns = mean)`.
#> Caused by warning:
#> ! Using `across()` without supplying `.cols` was deprecated in dplyr 1.1.0.
#>  Please supply `.cols` instead.

As part of the output of the make_hlth_ec_smry function is a BCEA object, we can use the BCEA package to produce a number of graphical summaries of economic results. One of the most important is the production of a cost-effectiveness plane. This plot highlights that, with an ICER of $-98,145.56, less than half of the bootstrapped iteration incremental cost and QALY pairs fall within the zone of cost-effectiveness (green). In fact, at the cost-effectiveness threshold we supplied, the results suggest there is a 8% probability that the intervention is cost-effective.

BCEA::ceplane.plot(he_smry_ls$ce_res_ls, wtp =50000,  graph = "ggplot2", theme = ggplot2::theme_light())

10 - Develop choice models

Using tools (soon to be formalised into ready4 framework modules) from the mychoice R package, it is possible to develop choice models from responses to a discrete choice experiment survey.

This below section renders a vignette article from the mychoice library. You can use the following links to:

library(mychoice)
#> The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
#> which was just loaded, will retire in October 2023.
#> Please refer to R-spatial evolution reports for details, especially
#> https://r-spatial.org/r/2023/05/15/evolution4.html.
#> It may be desirable to make the sf package available;
#> package maintainers should consider adding sf to Suggests:.
#> The sp package is now running under evolution status 2
#>      (status 2 uses the sf package in place of rgdal)

The tools in mychoice are designed to make it easier to develop and use choice models with ready4 - an open source health economic model of the systems shaping mental health and wellbeing in young people.

This development version of the mychoice package has been made available as part of the process of testing and documenting the package.

Currently there are no vignettes available. However, examples of the application of mychoice functions to a real world discrete choice experiment are in programs available at https://doi.org/10.5281/zenodo.6626256 (design of a discrete choice experiment survey) and https://doi.org/10.5281/zenodo.7223286 (analysis of discrete choice experiment survey responses). PDF versions of each program, along with the artefacts produced by each are available in the online dataset at https://doi.org/10.7910/DVN/VGPIPS.