Example 2: Predict health utility from psychological measures (PHQ-9 and GAD-7)

This tutorial illustrates the main steps for predicting AQoL-6D utility from two psychological measures using a longitudinal dataset in wide format.

Motivation

This article illustrates how to make QALY predictions using a dataset in wide format with no health-utility measures but containing two psychological measures (GAD-7 and PHQ-9).

Install youthu

If not already installed it will be necessary to install the youthu R library. As youthu is not yet available on CRAN, it will be necessary to install it directly from its GitHub repository using an R package like remotes or devtools.

# Uncomment and run if installation is required.
# utils::install.packages("devtools") 
# devtools::install_github("ready4-dev/youthu")

Load required packages

We now load the libraries we will be using in subsequent steps. Note, both the ready4, ready4show and ready4use ready4 framework libraries will have been installed automatically when youthu was installed. The specific readyforwhatsnext module library and dplyr, purrr, stringr and tidyr CRAN libraries will have been installed at the same time.

library(ready4)
library(ready4show)
library(ready4use)
library(specific)
library(youthu)

Specify data sources

We begin by specifying the sources for our data. In this example, our data sources are online repositories.

X <- Ready4useRepos(dv_nm_1L_chr = "fakes", dv_ds_nm_1L_chr = "https://doi.org/10.7910/DVN/HJXYKQ", 
                    dv_server_1L_chr = "dataverse.harvard.edu",
                    gh_repo_1L_chr = "ready4-dev/youthu", gh_tag_1L_chr = "v0.0.0.91125")

Inspect dataset

We can now inspect the dataset we will be using to make predictions. As this is a demonstration article we are going to create a custom synthetic dataset. Our first step in doing so is to ingest a preexisting synthetic dataset (in long format) using the method explained in another vignette article

data_tb <- ingest(X, fls_to_ingest_chr = c("ymh_phq_gad_tb"), metadata_1L_lgl = F)

Our resulting dataset has unique IDs for each participant (character class), timestamps for each data collection timepoint (Date class variables) and GAD-7 and PHQ-9 scores for each timepoint (integer class).

data_tb %>% head() %>% ready4show::print_table(caption_1L_chr = "Dataset", output_type_1L_chr = "HTML")

Dataset
fkClientID	d_interview_date_t1	d_interview_date_t2	gad7_t1	gad7_t2	phq9_t1	phq9_t2
Participant_1	2020-03-22	NA	6	NA	7	NA
Participant_2	2020-06-15	NA	12	NA	13	NA
Participant_3	2020-08-20	NA	16	NA	17	NA
Participant_4	2020-05-23	2020-08-19	12	12	17	14
Participant_5	2020-04-05	2020-07-19	14	6	22	8
Participant_6	2020-06-09	NA	8	NA	8	NA

Get mapping models

We retrieve details of relevant AQoL-6D mapping models for wither of the predictors we plan on using. How these models were derived is described in a pre-print and details of model performance is included in catalogues available in an open access data repository.

mdls_lup <- get_mdls_lup(ttu_dv_dss_tb = get_ttu_dv_dss("TTU"),
                         utility_type_chr = "AQoL-6D",
                         mdl_predrs_in_ds_chr = c("GAD7 total score", "PHQ9 total score"))

mdls_lup[,c(1,2,5)] %>% 
  ready4show::print_table(caption_1L_chr = "Available models", output_type_1L_chr = "HTML")

Available models
mdl_nms_chr	predrs_ls	source_chr
PHQ9_1_GLM_GSN_LOG	PHQ9	Primary Analysis
PHQ9_1_OLS_CLL	PHQ9	Primary Analysis
GAD7_1_GLM_GSN_LOG	GAD7	Primary Analysis
GAD7_1_OLS_CLL	GAD7	Primary Analysis
PHQ9_SOFAS_1_GLM_GSN_LOG	PHQ9 , SOFAS	Primary Analysis
PHQ9_SOFAS_1_OLS_CLL	PHQ9 , SOFAS	Primary Analysis
GAD7_SOFAS_1_GLM_GSN_LOG	GAD7 , SOFAS	Primary Analysis
GAD7_SOFAS_1_OLS_CLL	GAD7 , SOFAS	Primary Analysis
OASIS_PHQ9_1_GLM_GSN_LOG	OASIS, PHQ9	Secondary Analysis B
OASIS_PHQ9_1_OLS_CLL	OASIS, PHQ9	Secondary Analysis B
GAD7_PHQ9_1_GLM_GSN_LOG	GAD7, PHQ9	Secondary Analysis B
GAD7_PHQ9_1_OLS_CLL	GAD7, PHQ9	Secondary Analysis B
SCARED_PHQ9_1_GLM_GSN_LOG	SCARED, PHQ9	Secondary Analysis B
SCARED_PHQ9_1_OLS_CLL	SCARED, PHQ9	Secondary Analysis B

We select our preferred model and retrieve summary data about the model’s predictor variables.

predictors_lup <- get_predictors_lup(mdls_lup = mdls_lup, mdl_nm_1L_chr = "GAD7_PHQ9_1_OLS_CLL")

exhibit(predictors_lup)

Variable	Description	Minimum	Maximum	Class	Increment	Function	Scaling	Covariate
GAD7	GAD7 total score	0	21	integer	1	youthvars::youthvars_gad7	0.01	FALSE
PHQ9	PHQ9 total score	0	27	integer	1	youthvars::youthvars_phq9	0.01	FALSE

Transform prediction dataset

To be used with the mapping models available to us, our prediction dataset needs to be in long format. We perform the necessary transformation.

data_tb <- transform_ds_to_long(data_tb, predictors_chr = c("gad7", "phq9"),
                             msrmnt_date_var_nm_1L_chr = "d_interview_date", round_var_nm_1L_chr = "When")
#> Joining with `by = join_by(case_id, fkClientID, When)`
#> Joining with `by = join_by(case_id, fkClientID, When)`

We drop records where we are missing data for either GAD7 or PHQ9 at either timepoint.

data_tb <- transform_ds_to_drop_msng(data_tb, predictors_chr = c("gad7", "phq9"), 
                                      uid_var_nm_1L_chr = "fkClientID")

We now predict AQoL-6D health utility for each case with complete data.

predn_ds_ls <- make_predn_metadata_ls(data_tb,
                                      id_var_nm_1L_chr = "fkClientID",
                                      msrmnt_date_var_nm_1L_chr = "d_interview_date",
                                      predr_vars_nms_chr = c(GAD7 = "gad7", PHQ9 = "phq9"),
                                      round_var_nm_1L_chr = "When",
                                      round_bl_val_1L_chr = "t1",
                                      utl_var_nm_1L_chr = "AQoL6D_HU",
                                      mdls_lup = mdls_lup,
                                      mdl_nm_1L_chr = "GAD7_PHQ9_1_OLS_CLL")
data_tb <- add_utl_predn(data_tb, new_data_is_1L_chr = "Predicted", predn_ds_ls = predn_ds_ls)
#> Joining with `by = join_by(fkClientID, When)`

Finally, we derive QALY predictions from the health utility measures at both time-points.

data_tb <- data_tb %>% add_qalys_to_ds(predn_ds_ls = predn_ds_ls, include_predrs_1L_lgl = F, reshape_1L_lgl = T)

data_tb %>% head() %>%
  ready4show::print_table(caption_1L_chr = "Final dataset", output_type_1L_chr = "HTML",
                          scroll_box_args_ls = list(width = "100%"))

Final dataset
fkClientID	d_interview_date_t1	d_interview_date_t2	gad7_t1	gad7_t2	phq9_t1	phq9_t2	AQoL6D_HU_t1	AQoL6D_HU_t2	AQoL6D_HU_change_dbl_t1	AQoL6D_HU_change_dbl_t2	duration_prd_t1	duration_prd_t2	qalys_dbl_t1	qalys_dbl_t2
Participant_10	2020-08-05	2020-11-07	15	13	17	18	0.3891806	0.6342526	0	0.2450720	0S	94d 0H 0M 0S	0	0.1316943
Participant_1000	2020-09-06	2020-12-20	13	10	13	10	0.6609298	0.2963083	0	-0.3646215	0S	105d 0H 0M 0S	0	0.1375907
Participant_1001	2020-07-05	2020-10-15	10	11	10	16	0.5324127	0.6192971	0	0.0868844	0S	102d 0H 0M 0S	0	0.1608137
Participant_1003	2020-05-18	2020-08-12	6	8	16	7	0.5630164	0.8584193	0	0.2954030	0S	86d 0H 0M 0S	0	0.1673422
Participant_1005	2020-05-09	2020-08-25	14	5	20	9	0.5090272	0.7799675	0	0.2709403	0S	108d 0H 0M 0S	0	0.1905701
Participant_1006	2020-05-29	2020-08-25	15	9	21	17	0.2969778	0.2734973	0	-0.0234805	0S	88d 0H 0M 0S	0	0.0687225

Last modified June 8, 2024: updated vignettes (77a947c)