This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Programs for replicating study designs and analyses

R Markdown Programs combine modules of the readyforwhatsnextmodel with compatible datasets to implement reproducible and replicable analyses of youth mental health policy and system design topics.

1: Finding programs
2: Using readyforwhatsnext programs

2.1: Model health utility

2.1.1: Develop health utility mapping algorithms
2.1.2: Predict health utility

2.2: Model youth choices

2.2.1: Design a Discrete Choice Experiment
2.2.2: Analyse the results of a Discrete Choice Experiment

2.3: Create synthetic populations

2.3.1: Create a synthetic population of young people attending primary mental health services

1 - Finding programs

Programs are used to generate and report a model analysis.

What are programs?

Programs can be executed in their current form without the need for additional input data and, unless modified or run interactively (prompting a user for inputs during execution), will always generate the exact same output. They are typically deployed for configuring the run specifications of a computational model, specifying the data to which it will be applied and reporting analysis results.

Why are they used for in readyforwhatsnext?

readyforwhatsnext programs can be used for the following purposes:

to reproduce a study analysis, in which case you will need access to the original study data, and may also need to modify the program to specify the path to this data from your machine;
to replicate a study analysis (ie to apply the study algorithm to similar but different input data [this can be a new sample from the same population or, if used for demonstration purposes, fake data representative of the original study dataset]), in which case you will need to modify the program to specify the path to this data; and
to transfer a study analysis, in which case you use the program as a template to be modified to reflect key differences between the original study and your study.

Current readyforhwatsnext programs

Currently available readyforwhatsnext programs are summarised in the below table.

Program	Release	Date	Description	Source
aqol6dmap_fakes	0.0.9.0	02-Mar-2022	This program generates a purely synthetic (i.e. fake - no trace of any real records) population that is reasonably representative of the input data we used for the utility mapping study described in the article https://doi.org/10.1101/2021.07.07.21260129.	Dev, Archive
aqol6dmap_use	0.1	13-Sep-2022	Apply AQoL-6D Utility Mapping Models To New DataThis release includes minor formatting change and an updated version number.	Dev, Archive
dce_sa_analysis	0.1.1	27-Oct-2022	A self-documenting R Markdown program for analysing responses to a discrete choice experiment exploring the online help-seeking preferences of socially anxious young people.	Dev, Archive
dce_sa_design	0.0.9.3	26-Oct-2022	An R Markdown program to create the experimental design for a Discrete Choice Experiment (DCE) exploring online help seeking in socially anxious young people.This release uses functions from the mychoice R package (https://github.com/ready4-dev/mychoice).	Dev, Archive
ttu_lng_aqol6_csp	0.1	16-Sep-2022	Complete study program to reproduce all steps from data ingest through to results dissemination for a study to map mental health measures to AQoL-6D health utility.	Dev, Archive

Documentation

readyforwhatsnext programs are typically self-documenting, meaning that each section of code is integrated with plain English descriptions of the purpose it fulfills. The only programs that are not self-documenting are those whose primary purpose is to produce a document (normally an analysis report). Self-documenting programs and sub-routines will be typically documented as a PDF or HTML render of the RMarkdown source file. This rendered document will be bundled with the program, but in some cases may also be shared in online data repositories.

2 - Using readyforwhatsnext programs

The code used when applying readyforwhatsnext modules to a number of real world youth mental health policy and research projects is publicly available for review and reuse.

2.1 - Model health utility

Replication programs for developing, finding and applying utility mapping algorithms.

2.1.1 - Develop health utility mapping algorithms

Using modules from the TTU, youthvars, scorz and specific libraries, we developed utility mapping algorithms from a sample of young people attending primary mental health care services.

This below section embeds a PDF version of an R Markdown program. The following alternative options may provide improved viewing experience, more contextual information and access to more useful code formats:

2.1.2 - Predict health utility

Using functions (soon to be formalised into ready4 framework modules) from the youthu R package, we predicted health utility for a synthetic population of young people attending primary mental health care services.

This below section embeds a PDF version of an R Markdown program. The following alternative options may provide improved viewing experience, more contextual information and access to more useful code formats:

2.2 - Model youth choices

Replication programs for designing, analysing and reporting discrete choice experiments.

2.2.1 - Design a Discrete Choice Experiment

We used functions (soon to be formalised into ready4 modules) from the mychoice R package to design to a discrete choice experiment.

This below section embeds a PDF version of an R Markdown program. The following alternative options may provide improved viewing experience, more contextual information and access to more useful code formats:

2.2.2 - Analyse the results of a Discrete Choice Experiment

Using functions (soon to be formalised into ready4 framework modules) from the mychoice R package, it is possible to develop choice models from responses to a discrete choice experiment survey.

This below section embeds a PDF version of an R Markdown program. The following alternative options may provide improved viewing experience, more contextual information and access to more useful code formats:

2.3 - Create synthetic populations

Replication programs for constructing synthetic populations.

2.3.1 - Create a synthetic population of young people attending primary mental health services

We created a basic synthetic dataset of to represent a clinical youth mental health sample.

This below section renders an R Markdown program. The following alternative options may provide improved viewing experience, more contextual information and access to more useful code formats:

Introduction

This program generates a purely synthetic (i.e. fake - no trace of any real records) population that is reasonably representative of the input data we used for the utility mapping study described in the article https://doi.org/10.1101/2021.07.07.21260129.

No access to the real data is required in order to use this program - it is based on summary statistics (e.g. means and standard deviations of variables, correlation matrices). It should be noted however, that a different (and simpler) workflow can be implemented when you do have access to the source dataset (for example, by using the syn function from the synthpop package).

The output of this program is very similar but not identical to a fake dataset created by an earlier version of this program and which is saved in the “ymh_clinical_dict_r3.RDS” file from the https://doi.org/10.7910/DVN/HJXYKQ data repository.

Install required R packages

If you do not have the following packages already installed, uncomment and run the following lines.

# install.packages("faux")
# devtools::install_github("ready4-dev/ready4) 
# devtools::install_github("ready4-dev/youthvars) 
# devtools::install_github("ready4-dev/scorz) 
# devtools::install_github("ready4-dev/specific") 
# devtools::install_github("ready4-dev/TTU")
# devtools::install_github("ready4-dev/youthu")

Load the ready4 framework package.

library(ready4)

Specify parameters to generate outcome fake data

AQoL item response parameters

The first set of input data are the proportions for each allowed response for each of the twenty AQOL-6D questions at both baseline and followup.

aqol_items_prpns_tbs_ls <- list(bl_answer_props_tb = tibble::tribble(
    ~Question, ~Answer_1, ~Answer_2, ~Answer_3, ~Answer_4, ~Answer_5, ~Answer_6,
    "Q1", 0.35, 0.38, 0.16, 0.03, NA_real_,100, # Check item 5 in real data.
    "Q2", 0.28, 0.38, 0.18, 0.08, 0.04,100,
    "Q3", 0.78, 0.18, 0.03, 0.01, 0.0, 100,
    "Q4", 0.64, 0.23, 0.09, 0.0, 100, NA_real_,
    "Q5", 0.3, 0.48, 0.12, 0.05, 100, NA_real_,
    "Q6", 0.33, 0.48, 0.15, 100, NA_real_,NA_real_,
    "Q7", 0.44, 0.27, 0.11, 100, NA_real_, NA_real_,
    "Q8", 0.18, 0.29, 0.23, 0.21, 100, NA_real_,
    "Q9", 0.07, 0.27, 0.19, 0.37, 100, NA_real_,
    "Q10", 0.04, 0.15, 0.4, 0.25, 100, NA_real_,
    "Q11", 0.03, 0.13, 0.52, 0.25, 100, NA_real_,
    "Q12", 0.06, 0.21, 0.25, 0.34, 100, NA_real_,
    "Q13", 0.05, 0.25, 0.31, 0.28, 100, NA_real_,
    "Q14", 0.05, 0.3, 0.34, 0.25, 100, NA_real_,
    "Q15", 0.57, 0.25, 0.12, 100, NA_real_,NA_real_,
    "Q16", 0.48, 0.42, 0.06, 100, NA_real_, NA_real_,
    "Q17", 0.44, 0.3, 0.16, 0.07, 100, NA_real_,
    "Q18", 0.33, 0.38, 0.25, 0.04, 0.0, 100,
    "Q19", 0.33, 0.49, 0.16, 0.02, 0.0, 100,
    "Q20", 0.67, 0.21, 0.02, 100, NA_real_,NA_real_),
    fup_answer_props_tb = tibble::tribble(
    ~Question, ~Answer_1, ~Answer_2, ~Answer_3, ~Answer_4, ~Answer_5, ~Answer_6,
    "Q1", 0.51, 0.33, 0.12, 0.02, NA_real_, 100,
    "Q2", 0.36, 0.38, 0.16, 0.06, 0.02,100,
    "Q3", 0.81, 0.15, 0.04, 0.00, 0.0, 100,
    "Q4", 0.73, 0.18, 0.09, 0.0, 100, NA_real_,
    "Q5", 0.36, 0.42, 0.12, 0.05, 100, NA_real_,
    "Q6", 0.48, 0.40, 0.11, 100, NA_real_,NA_real_,
    "Q7", 0.57, 0.25, 0.09, 100, NA_real_, NA_real_,
    "Q8", 0.31, 0.33, 0.17, 0.12, 100, NA_real_,
    "Q9", 0.13, 0.35, 0.19, 0.23, 100, NA_real_,
    "Q10", 0.1, 0.21, 0.43, 0.16, 100, NA_real_,
    "Q11", 0.06, 0.25, 0.48, 0.18, 100, NA_real_,
    "Q12", 0.08, 0.27, 0.26, 0.25, 100, NA_real_,
    "Q13", 0.07, 0.37, 0.31, 0.19, 100, NA_real_,
    "Q14", 0.08, 0.37, 0.34, 0.15, 100, NA_real_,
    "Q15", 0.62, 0.23, 0.09, 100, NA_real_,NA_real_,
    "Q16", 0.52, 0.40, 0.06, 100, NA_real_, NA_real_,
    "Q17", 0.51, 0.28, 0.15, 0.06, 100, NA_real_,
    "Q18", 0.37, 0.35, 0.25, 0.03, 0.0, 100,
    "Q19", 0.43, 0.40, 0.16, 0.01, 0.0, 100,
    "Q20", 0.77, 0.21, 0.02, 100, NA_real_,NA_real_)) %>%
  youthvars::make_complete_prpns_tbs_ls()

Outcome variable correlation parameters

First we specify the names of variables we will be creating as outcome variables.

var_names_chr <- c("aqol6d_total_w","phq9_total","bads_total",
                   "gad7_total","oasis_total","scared_total","k6_total")

The next step is to specify the correlations between outcome variables (variables assumed to be ordered as in previous step) at baseline and follow-up timepoints.

cor_mat_ls <- list(matrix(c(1,-0.78,0.72,-0.67,-0.71,-0.65,-0.67,
                               NA,1,-0.73,0.69,0.66,0.63,0.71,
                               NA,NA,1,-.57,-0.64,-0.57,-0.65,
                               NA,NA,NA,1,0.74,0.70,0.63,
                               NA,NA,NA,NA,1,0.7,0.59,
                               NA,NA,NA,NA,NA,1,0.55,
                               NA,NA,NA,NA,NA,NA,1),7,7),
                    matrix(c(1,-0.81,0.72,-0.71,-0.73,-0.64,-0.68,
                        NA,1,-0.72,0.69,0.68,0.61,0.68,
                        NA,NA,1,-0.59,-0.61,-0.51,-0.61,
                        NA,NA,NA,1,0.75,0.71,0.6,
                        NA,NA,NA,NA,1,0.68,0.59,
                        NA,NA,NA,NA,NA,1,0.52,
                        NA,NA,NA,NA,NA,NA,1),7,7))

We now specify the univariate distribution parameters for each of the outcome variables.

synth_data_spine_ls <- list(cor_mat_ls = cor_mat_ls,
                            nbr_obs_dbl = c(1068,643),
                            timepoint_nms_chr = c("BL","FUP"),
                            means_ls = list(c(0.6,12.8,78.2, 10.4,8.1,34.2,12.2),
                                            c(0.7,9.8,89.4, 7.9,6.3,28.8,9.8)),
                            sds_ls = list(c(0.2,6.6,24.8,5.7,4.7,17.9,5.8),
                                          c(0.2,6.5,24.4,5.5,4.3,17.8,5.9)),
                            missing_ls = list(c(0,4,10,6,7,7,4),
                                              c(0,5,2,2,1,2,2)),
                            min_max_ls = list(c(0.03,1),
                                              c(0,27),
                                              c(0,150),
                                              c(0,21),
                                              c(0,20),
                                              c(0,82),
                                              c(0,24)),
                            discrete_lgl = c(F,rep(T,6)),
                            var_names_chr = var_names_chr,
                            aqol_tots_var_nms_chr = c(cumulative = "aqol6d_total_c",
                                                      weighted = "aqol6d_total_w"))

Generate fake data

Create fake outcome variable datasets

We now use the parameters we have just specified to create baseline and follow-up datasets with fake data for our nominated outcome variables.

aqol_scores_pars_ls <- list(means_dbl = c(44.5,40.6), 
                            sds_dbl = c(9.9,9.8),
                            corr_dbl = -0.95)
aqol6d_adol_pop_tbs_ls <- aqol_items_prpns_tbs_ls %>%
  scorz::make_aqol6d_adol_pop_tbs_ls(aqol_scores_pars_ls = aqol_scores_pars_ls,
                                     series_names_chr =  c("bl_outcomes_tb",
                                                           "fup_outcomes_tb"),
                                     synth_data_spine_ls = synth_data_spine_ls,
                                     temporal_cors_ls = list(aqol6d_total_w = 0.85))
#> Joining with `by = join_by(id, match_var_chr)`
#> Joining with `by = join_by(id)`
#> Joining with `by = join_by(id, match_var_chr)`
#> Joining with `by = join_by(id)`

Create fake descriptive variables

We now specify the names and statistical parameters of the variables we will be using in descriptive statistics. For this analysis we are not interested in capturing the joint distribution between these variables, so we only use univariate parameters.

descriptives_BL_tb <- tibble::tibble(fkClientID = aqol6d_adol_pop_tbs_ls$bl_outcomes_tb$fkClientID,
                                     round = c(1) ,
                                     d_age = rnorm(1068,18.1,3.3) %>% 
                                       purrr::map_dbl(~min(.x,25) %>% 
                                                        max(12)),
                                     d_gender = c(rep(1,653),
                                                  rep(2,359),
                                                  rep(3,39),
                                                  rep(NA_real_,17)) %>% 
                                       specific::scramble_xx() %>%
                                       factor(labels = c("Female","Male","Other")),
                                     d_sexual_ori_s = c(rep(1,738),
                                                        rep(2,289),
                                                        rep(NA_real_,41)) %>% 
                                       specific::scramble_xx() %>%
                                       factor(labels = c("Straight","Other")),
                                     Region = c(rep(1,671),
                                                rep(2,397)) %>% 
                                       specific::scramble_xx() %>%
                                       factor(labels = c("Metro","Regional")),
                                     CALD = c(rep(T,759),
                                              rep(F,169),
                                              rep(NA,140)) %>% 
                                       specific::scramble_xx(),
                                     d_studying_working = c(rep(1,405),
                                                            rep(2,167),
                                                            rep(3,305),
                                                            rep(4,159),
                                                            rep(NA_real_,32)) %>% 
                                       specific::scramble_xx() %>% 
                                       factor(labels = c("Studying only",
                                                         "Working only",
                                                         "Studying and working",
                                                         "Not studying or working")),
                                     c_p_diag_s = c(rep(1,182),
                                                    rep(2,264),
                                                    rep(3,332),
                                                    rep(4,237),
                                                    rep(NA_real_,53)) %>% 
                                       specific::scramble_xx() %>%
                         factor(labels = c("Depression", "Anxiety","Depression and Anxiety", "Other")),
                         c_clinical_staging_s = c(rep(1,625),
                                                  rep(2,326),
                                                  rep(3,86),
                                                  rep(NA_real_,31)) %>% 
                           specific::scramble_xx() %>%
                           factor(labels = c("0-1a","1b","2-4")),
                         c_sofas = c(rnorm(1068-30,65.2,9.5),
                                     rep(NA_real_,30)) %>% 
                           purrr::map_dbl(~min(.x,100) %>% 
                                            max(0)) %>% 
                           specific::scramble_xx(),
                         s_centre = NA_character_, 
                         d_agegroup = NA_character_, 
                         d_sex_birth_s = NA_character_, 
                         d_country_bir_s = NA_character_,
                         d_ATSI = NA_character_,
                         d_english_home = NA, 
                         d_english_native = NA, 
                         d_relation_s = c(rep(1,325),
                                          rep(2,426),
                                          rep(3,286),
                                          rep(NA_real_,31)) %>% 
                           specific::scramble_xx() %>%
                           factor(labels = c("REPLACE_ME_1",
                                             "REPLACE_ME_2",
                                             "REPLACE_ME_3")))  %>%
  dplyr::mutate(d_sex_birth_s = dplyr::case_when(is.na(d_gender) ~ NA_integer_,
                                                 as.integer(d_gender) %in% 
                                                   c(1L,2L) & 
                                                   runif(1068)>0.995 ~ as.integer(d_gender) %>%
                                                   purrr::map_int(~ ifelse(is.na(.x), 
                                                                           .x, 
                                                                           switch(.x,2L,1L,3L))),
                                                 as.integer(d_gender) == 3 ~ sample(c(1L,2L), 
                                                                                    1068, 
                                                                                    replace = T),
                                                 TRUE ~ as.integer(d_gender)
                                                 ) %>%
                  factor(labels = c("Female","Male")))
descriptives_FUP_tb <- descriptives_BL_tb %>% 
  dplyr::filter(fkClientID %in% 
                  aqol6d_adol_pop_tbs_ls$fup_outcomes_tb$fkClientID) %>%
  dplyr::mutate(round = 2,
                d_age = d_age + 0.25,
                Region = Region %>% 
                  specific::randomise_changes_in_fct_lvls(0.98),
                d_studying_working = d_studying_working %>%
                  specific::randomise_changes_in_fct_lvls(0.9),
                c_p_diag_s = c_p_diag_s %>% 
                  specific::randomise_changes_in_fct_lvls(0.90),
                c_clinical_staging_s = c_clinical_staging_s %>% 
                  specific::randomise_changes_in_fct_lvls(0.8),
                c_sofas = c_sofas + rnorm(643,4.7,10) %>% 
                         purrr::map_dbl(~min(.x,100) %>% max(0)))
bl_tb <- dplyr::inner_join(descriptives_BL_tb,
                           aqol6d_adol_pop_tbs_ls$bl_outcomes_tb) 
#> Joining with `by = join_by(fkClientID)`
fup_tb <- dplyr::inner_join(descriptives_FUP_tb,
                            aqol6d_adol_pop_tbs_ls$fup_outcomes_tb)
#> Joining with `by = join_by(fkClientID)`

We make some adjustments to ensure that the c_sofas variable is correlated with our aqol6d_total_w variable at both baseline and follow-up.

bl_tb <- bl_tb %>%
  dplyr::mutate(c_sofas = faux::rnorm_pre(bl_tb$aqol6d_total_w %>% 
                                            as.vector(), 
                                          mu = 65.2, 
                                          sd = 9.5, 
                                          r = 0.5, 
                                          empirical = T) %>% 
                  purrr::map_dbl(~min(.x,100) %>% max(0)))
fup_tb <- fup_tb %>%
  dplyr::mutate(c_sofas = faux::rnorm_pre(fup_tb$aqol6d_total_w %>% 
                                            as.vector(), 
                                          mu = 69.9, 
                                          sd = 10, 
                                          r = 0.5, 
                                          empirical = T) %>% 
                  purrr::map_dbl(~min(.x,100) %>% max(0)))

Combine datasets

We now add the fake outcome variables dataset to the fake descriptive variables dataset.

composite_tb <- dplyr::bind_rows(bl_tb, fup_tb) %>%
  dplyr::mutate(d_age = floor(d_age)) %>%
  dplyr::mutate(d_gender = d_gender %>% as.character() %>%
                  purrr::map_chr(~ifelse(.x=="Other",
                                         sample(c("Genderqueer/gender nonconforming/agender",
                                                              "Transgender"),1),
                                         .x)),
                s_centre = Region %>% as.character() %>%
                  purrr::map_chr(~ifelse(.x=="Metro",
                                         sample(c("Canberra","Southport","Knox"),1),
                                         "Regional Centre")),
                d_country_bir_s = CALD %>%
                  purrr::map_chr(~ifelse(.x,
                                         "Other",
                                         "Australia")), 
                       d_ATSI = CALD %>%
                  purrr::map_chr(~ifelse(.x,
                                         "Yes",
                                         "No")),
                       d_english_home = CALD %>%
                  purrr::map_chr(~ifelse(.x,
                                         "No",
                                         "Yes")), 
                       d_english_native = CALD %>%
                  purrr::map_chr(~ifelse(.x,
                                         "No",
                                         "Yes"))
                ) %>%
  dplyr::select(-CALD) %>%
  dplyr::select(-Region)
composite_tb <- composite_tb %>%
  dplyr::select(-setdiff(names(composite_tb)[startsWith(names(composite_tb),
                                                        "aqol6d_")],
                         names(composite_tb)[startsWith(names(composite_tb),
                                                        "aqol6d_q")]))
composite_tb <- composite_tb %>%
  dplyr::mutate(c_sofas = as.integer(round(c_sofas,0))) %>%
  dplyr::mutate(round = factor(round, labels = c("Baseline",
                                                 "Follow-up"))) %>%
  dplyr::mutate(d_relation_s = dplyr::case_when(d_relation_s %in% 
                                                  c("REPLACE_ME_1","REPLACE_ME_2") ~ 
                                                  "Not in a relationship",
                                                T ~ "In a relationship")) %>%
  youthu::add_dates_from_dstr(bl_start_date_dtm = Sys.Date() - lubridate::days(600),##
                              bl_end_date_dtm = Sys.Date() - lubridate::days(420),
                              duration_args_ls = list(a = 60, b = 140, mean = 90, sd = 10),
                              duration_fn = truncnorm::rtruncnorm,
                              date_var_nm_1L_chr = "d_interview_date") %>%
  dplyr::select(-duration_prd) %>%
  youthvars::transform_raw_ds_for_analysis() %>%
  dplyr::rename(phq9_total = PHQ9,
                bads_total = BADS,
                gad7_total = GAD7,
                oasis_total = OASIS,
                scared_total = SCARED,
                k6_total = K6,
                c_sofas = SOFAS) %>%
  dplyr::select(-c("d_agegroup","Gender", "CALD", "Region"))