This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Find and deploy utility mapping models

Using tools (soon to be formalised into ready4 modules) from the youthu R package, it is possible to find and deploy relevant utility mapping algorithms.

1: Example 1: Predict health utility from psychological and functional measures (PHQ-9 and SOFAS)
2: Example 2: Predict health utility from psychological measures (PHQ-9 and GAD-7)

1 - Example 1: Predict health utility from psychological and functional measures (PHQ-9 and SOFAS)

This tutorial illustrates the main steps for predicting AQoL-6D utility from psychological and functional measures using a longitudinal dataset in long format.

This below section renders a vignette article from the youthu library. You can use the following links to:

view the vignette on the library website (adds useful hyperlinks to code blocks)
view the source file from that article, and;
edit its contents (requires a GitHub account).

library(ready4)
library(ready4use)
library(youthu)

This vignette outlines a workflow for:

Searching, selecting and retrieving transfer to utility models;
Preparing a prediction dataset for use with a selected transfer to utility model; and
Applying the selected transfer to utility model to a prediction dataset to predict Quality Adjusted Life Years (QALYs).

The practical value of implementing such a workflow is discussed in the economic analysis vignette and a scientific manuscript. Note, this example uses fake data - it should should not be used to inform decision making.

Search, select and retrieve transfer to utility models

To identify datasets that contain transfer to utility models compatible with youthu (ie those developped with the TTU package), you can use the get_ttu_dv_dss function. The function searches specified dataverses (in the below example, the TTU dataverse) for datasets containing output from the TTU package.

ttu_dv_dss_tb <- get_ttu_dv_dss("TTU")

The ttu_dv_dss_tb table summarises some pertinent details about each dataset containing TTU models found by the preceding command. These details include a link to any scientific summary (the “Article” column) associated with a dataset.

Transfer to Utility Datasets

ID

Utility

Predictors

Article

1

aqol6dtotalw

BADS total score , GAD7 total score , K6 total score , OASIS total score , PHQ9 total score , SCARED total score, SOFAS total score

Transfer to Utility Datasets
ID	Utility	Predictors	Article
1	aqol6dtotalw	BADS total score , GAD7 total score , K6 total score , OASIS total score , PHQ9 total score , SCARED total score, SOFAS total score

To identify models that predict a specified type of health utility from one or more of a specified subset of predictors, use:

mdls_lup <- get_mdls_lup(ttu_dv_dss_tb = ttu_dv_dss_tb,
                         utility_type_chr = "AQoL-6D",
                         mdl_predrs_in_ds_chr = c("PHQ9 total score",
                                                  "SOFAS total score"))

The preceding command will produce a lookup table with information that includes the catalogue names of models, the predictors used in each model and the analysis that generated each one.

Selected elements from Models Look-Up Table

Catalogue reference

Predictors

Analysis

PHQ9_1_GLM_GSN_LOG

PHQ9

Primary Analysis

PHQ9_1_OLS_CLL

PHQ9

Primary Analysis

PHQ9_SOFAS_1_GLM_GSN_LOG

PHQ9 , SOFAS

Primary Analysis

PHQ9_SOFAS_1_OLS_CLL

PHQ9 , SOFAS

Primary Analysis

OASIS_SOFAS_1_GLM_GSN_LOG

OASIS, SOFAS

Primary Analysis

OASIS_SOFAS_1_OLS_CLL

OASIS, SOFAS

Primary Analysis

BADS_SOFAS_1_GLM_GSN_LOG

BADS , SOFAS

Primary Analysis

BADS_SOFAS_1_OLS_CLL

BADS , SOFAS

Primary Analysis

K6_SOFAS_1_GLM_GSN_LOG

K6 , SOFAS

Primary Analysis

K6_SOFAS_1_OLS_CLL

K6 , SOFAS

Primary Analysis

SCARED_SOFAS_1_GLM_GSN_LOG

SCARED, SOFAS

Primary Analysis

SCARED_SOFAS_1_OLS_CLL

SCARED, SOFAS

Primary Analysis

GAD7_SOFAS_1_GLM_GSN_LOG

GAD7 , SOFAS

Primary Analysis

GAD7_SOFAS_1_OLS_CLL

GAD7 , SOFAS

Primary Analysis

SOFAS_1_GLM_GSN_LOG

SOFAS

Secondary Analysis A

SOFAS_1_OLS_CLL

SOFAS

Secondary Analysis A

OASIS_PHQ9_1_GLM_GSN_LOG

OASIS, PHQ9

Secondary Analysis B

OASIS_PHQ9_1_OLS_CLL

OASIS, PHQ9

Secondary Analysis B

GAD7_PHQ9_1_GLM_GSN_LOG

GAD7, PHQ9

Secondary Analysis B

GAD7_PHQ9_1_OLS_CLL

GAD7, PHQ9

Secondary Analysis B

SCARED_PHQ9_1_GLM_GSN_LOG

SCARED, PHQ9

Secondary Analysis B

SCARED_PHQ9_1_OLS_CLL

SCARED, PHQ9

Secondary Analysis B

To review the summary information about the predictive performance of a specific model, use:

get_dv_mdl_smrys(mdls_lup,
                 mdl_nms_chr = "PHQ9_SOFAS_1_OLS_CLL")
#> $PHQ9_SOFAS_1_OLS_CLL
#>        Parameter Estimate    SE          95% CI
#> 1 SD (Intercept)    0.348 0.017   0.312 , 0.382
#> 2      Intercept    0.428 0.129   0.174 , 0.686
#> 3  PHQ9 baseline   -9.115 0.249 -9.601 , -8.618
#> 4    PHQ9 change   -7.331 0.339 -8.007 , -6.665
#> 5 SOFAS baseline    0.960 0.172   0.616 , 1.292
#> 6   SOFAS change    1.146 0.235   0.674 , 1.607
#> 7             R2    0.767 0.012   0.743 , 0.788
#> 8           RMSE    0.925 0.004   0.922 , 0.928
#> 9          Sigma    0.406 0.012   0.384 , 0.429

More information about a selected model can be found in the online model catalogue, the link to which can be obtained with the following command:

get_mdl_ctlg_url(mdls_lup,
                 mdl_nm_1L_chr = "PHQ9_SOFAS_1_OLS_CLL")

[1] “https://dataverse.harvard.edu/api/access/datafile/6484935”

Prepare a prediction dataset for use with a selected transfer to utility model

Import data

You can now import and inspect the dataset you plan on using for prediction. In the below example we use fake data.

data_tb <- make_fake_ds_one()

Illustrative example of a prediction dataset

UID

Timepoint

Date

PHQ_total

SOFAS_total

Participant_1

Baseline

2022-12-20

7

69

Participant_10

Baseline

2022-11-16

17

60

Participant_10

Follow-up

2023-02-21

17

64

Participant_100

Baseline

2023-01-31

0

76

Participant_1000

Baseline

2023-02-05

0

71

Participant_1000

Follow-up

2023-04-10

0

71

Illustrative example of a prediction dataset
UID	Timepoint	Date	PHQ_total	SOFAS_total
Participant_1	Baseline	2022-12-20	7	69
Participant_10	Baseline	2022-11-16	17	60
Participant_10	Follow-up	2023-02-21	17	64
Participant_100	Baseline	2023-01-31	0	76
Participant_1000	Baseline	2023-02-05	0	71
Participant_1000	Follow-up	2023-04-10	0	71

Confirm dataset can be used as a prediction dataset

The prediction dataset must contain variables that correspond to all the predictors of the model you intend to apply. The allowable range and required class of each predictor variable are described in the min_val_dbl, max_val_dbl and class_chr columns of the model predictors lookup table, which can be accessed with a call to the get_predictors_lup function.

predictors_lup <- get_predictors_lup(mdls_lup = mdls_lup,
                                     mdl_nm_1L_chr = "PHQ9_SOFAS_1_OLS_CLL")

Model predictors lookup table

short_name_chr

long_name_chr

min_val_dbl

max_val_dbl

class_chr

increment_dbl

class_fn_chr

mdl_scaling_dbl

covariate_lgl

PHQ9

PHQ9 total score

0

27

integer

1

youthvars::youthvars_phq9

0.01

FALSE

SOFAS

SOFAS total score

0

100

integer

1

youthvars::youthvars_sofas

0.01

TRUE

Model predictors lookup table
short_name_chr	long_name_chr	min_val_dbl	max_val_dbl	class_chr	increment_dbl	class_fn_chr	mdl_scaling_dbl	covariate_lgl
PHQ9	PHQ9 total score	0	27	integer	1	youthvars::youthvars_phq9	0.01	FALSE
SOFAS	SOFAS total score	0	100	integer	1	youthvars::youthvars_sofas	0.01	TRUE

The prediction dataset must also include both a unique client identifier variable and a measurement time-point identifier variable (which must be a factor with two levels). The dataset also needs to be in long format (ie where measures at different time-points for the same individual are stacked on top of each other in separate rows). We can confirm these conditions hold by creating a dataset metadata object using the make_predn_metadata_ls function. In creating the metadata object, the function checks that the dataset can be used in conjunction with the model specified at the mdl_nm_1L_chr argument. If the prediction dataset uses different variable names for the predictors to those specified in the predictors_lup lookup table, a named vector detailing the correspondence between the two sets of variable names needs to be passed to the predr_vars_nms_chr argument. Finally, if you wish to specify a preferred variable name to use for the predicted utility values when applying the model, you can do this by passing this name to the utl_var_nm_1L_chr argument.

predn_ds_ls <- make_predn_metadata_ls(data_tb,
                                      id_var_nm_1L_chr = "UID",
                                      msrmnt_date_var_nm_1L_chr = "Date",
                                      predr_vars_nms_chr = c(PHQ9 = "PHQ_total",SOFAS = "SOFAS_total"),
                                      round_var_nm_1L_chr = "Timepoint",
                                      round_bl_val_1L_chr = "Baseline",
                                      utl_var_nm_1L_chr = "AQoL6D_HU",
                                      mdls_lup = mdls_lup,
                                      mdl_nm_1L_chr = "PHQ9_SOFAS_1_OLS_CLL")

Apply the selected transfer to utility model to a prediction dataset to predict Quality Adjusted Life Years (QALYs)

Predict health utility at baseline and follow-up timepoints

To generate utility predictions we use the add_utl_predn function. The function needs to be supplied with the prediction dataset (the value passed to argument data_tb) and the validated prediction metadata object we created in the previous step.

data_tb <- add_utl_predn(data_tb,
                         predn_ds_ls = predn_ds_ls)
#> Joining with `by = join_by(UID, Timepoint)`

By default the add_utl_predn function samples model parameter values based on a table of model coefficients when making predictions and constrains predictions to an allowed range. You can override these defaults by adding additional arguments new_data_is_1L_chr = "Predicted" (which uses mean parameter values), force_min_max_1L_lgl = F (removes range constraint) and (if the source dataset makes available downloadable model objects) make_from_tbl_1L_lgl = F. These settings will produce different predictions. It is strongly recommended that you consult the model catalogue (see above) to understand how such decisions may affect the validity of the predicted values that will be generated.

Prediction dataset with predicted utilities

UID

Timepoint

Date

PHQ_total

SOFAS_total

AQoL6D_HU

Participant_1

Baseline

2022-12-20

7

69

0.9193293

Participant_10

Baseline

2022-11-16

17

60

0.6721956

Participant_10

Follow-up

2023-02-21

17

64

0.4242752

Participant_100

Baseline

2023-01-31

0

76

0.7530591

Participant_1000

Baseline

2023-02-05

0

71

0.7613385

Participant_1000

Follow-up

2023-04-10

0

71

0.9930864

Prediction dataset with predicted utilities
UID	Timepoint	Date	PHQ_total	SOFAS_total	AQoL6D_HU
Participant_1	Baseline	2022-12-20	7	69	0.9193293
Participant_10	Baseline	2022-11-16	17	60	0.6721956
Participant_10	Follow-up	2023-02-21	17	64	0.4242752
Participant_100	Baseline	2023-01-31	0	76	0.7530591
Participant_1000	Baseline	2023-02-05	0	71	0.7613385
Participant_1000	Follow-up	2023-04-10	0	71	0.9930864

Our health utility predictions are now available for use and are summarised below.

summary(data_tb$AQoL6D_HU)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#> 0.06525 0.42832 0.62654 0.62142 0.83585 0.99999

Calculate QALYs

The last step is to calculate Quality Adjusted Life Years, using a method assuming a linear rate of change between timepoints.

data_tb <- data_tb %>% add_qalys_to_ds(predn_ds_ls = predn_ds_ls,
                                       include_predrs_1L_lgl = F,
                                       reshape_1L_lgl = F)

Prediction dataset with QALYs

UID

Timepoint

Date

PHQ_total

SOFAS_total

AQoL6D_HU

AQoL6D_HU_change_dbl

duration_prd

qalys_dbl

Participant_1

Baseline

2022-12-20

7

69

0.9193293

0.0000000

0S

0.0000000

Participant_10

Baseline

2022-11-16

17

60

0.6721956

0.0000000

0S

0.0000000

Participant_10

Follow-up

2023-02-21

17

64

0.4242752

-0.2479204

97d 0H 0M 0S

0.1455957

Participant_100

Baseline

2023-01-31

0

76

0.7530591

0.0000000

0S

0.0000000

Participant_1000

Baseline

2023-02-05

0

71

0.7613385

0.0000000

0S

0.0000000

Participant_1000

Follow-up

2023-04-10

0

71

0.9930864

0.2317479

64d 0H 0M 0S

0.1537073

Prediction dataset with QALYs
UID	Timepoint	Date	PHQ_total	SOFAS_total	AQoL6D_HU	AQoL6D_HU_change_dbl	duration_prd	qalys_dbl
Participant_1	Baseline	2022-12-20	7	69	0.9193293	0.0000000	0S	0.0000000
Participant_10	Baseline	2022-11-16	17	60	0.6721956	0.0000000	0S	0.0000000
Participant_10	Follow-up	2023-02-21	17	64	0.4242752	-0.2479204	97d 0H 0M 0S	0.1455957
Participant_100	Baseline	2023-01-31	0	76	0.7530591	0.0000000	0S	0.0000000
Participant_1000	Baseline	2023-02-05	0	71	0.7613385	0.0000000	0S	0.0000000
Participant_1000	Follow-up	2023-04-10	0	71	0.9930864	0.2317479	64d 0H 0M 0S	0.1537073

2 - Example 2: Predict health utility from psychological measures (PHQ-9 and GAD-7)

This tutorial illustrates the main steps for predicting AQoL-6D utility from two psychological measures using a longitudinal dataset in wide format.

This below section renders a vignette article from the youthu library. You can use the following links to:

view the vignette on the library website (adds useful hyperlinks to code blocks)
view the source file from that article, and;
edit its contents (requires a GitHub account).

This vignette article is abridged and modified version of another article on predicting Quality Adjusted Life Years with youthu.

Motivation

This article illustrates how to make QALY predictions using a dataset in wide format with no health-utility measures but containing two psychological measures (GAD-7 and PHQ-9).

Install youthu

If not already installed it will be necessary to install the youthu R library. As youthu is not yet available on CRAN, it will be necessary to install it directly from its GitHub repository using an R package like remotes or devtools.

# Uncomment and run if installation is required.
# utils::install.packages("devtools") 
# devtools::install_github("ready4-dev/youthu")

Load required packages

We now load the libraries we will be using in subsequent steps. Note, both the ready4, ready4show and ready4use ready4 framework libraries will have been installed automatically when youthu was installed. The specific readyforwhatsnext module library and dplyr, purrr, stringr and tidyr CRAN libraries will have been installed at the same time.

library(ready4)
library(ready4show)
library(ready4use)
library(specific)
library(youthu)

Specify data sources

We begin by specifying the sources for our data. In this example, our data sources are online repositories.

X <- Ready4useRepos(dv_nm_1L_chr = "fakes", dv_ds_nm_1L_chr = "https://doi.org/10.7910/DVN/HJXYKQ", 
                    dv_server_1L_chr = "dataverse.harvard.edu",
                    gh_repo_1L_chr = "ready4-dev/youthu", gh_tag_1L_chr = "v0.0.0.91125")

Inspect dataset

We can now inspect the dataset we will be using to make predictions. As this is a demonstration article we are going to create a custom synthetic dataset. Our first step in doing so is to ingest a preexisting synthetic dataset (in long format) using the method explained in another vignette article

data_tb <- ingest(X, fls_to_ingest_chr = c("ymh_phq_gad_tb"), metadata_1L_lgl = F)

Our resulting dataset has unique IDs for each participant (character class), timestamps for each data collection timepoint (Date class variables) and GAD-7 and PHQ-9 scores for each timepoint (integer class).

data_tb %>% head() %>% ready4show::print_table(caption_1L_chr = "Dataset", output_type_1L_chr = "HTML")

Dataset
fkClientID	d_interview_date_t1	d_interview_date_t2	gad7_t1	gad7_t2	phq9_t1	phq9_t2
Participant_1	2020-03-22	NA	6	NA	7	NA
Participant_2	2020-06-15	NA	12	NA	13	NA
Participant_3	2020-08-20	NA	16	NA	17	NA
Participant_4	2020-05-23	2020-08-19	12	12	17	14
Participant_5	2020-04-05	2020-07-19	14	6	22	8
Participant_6	2020-06-09	NA	8	NA	8	NA

Get mapping models

We retrieve details of relevant AQoL-6D mapping models for wither of the predictors we plan on using. How these models were derived is described in a pre-print and details of model performance is included in catalogues available in an open access data repository.

mdls_lup <- get_mdls_lup(ttu_dv_dss_tb = get_ttu_dv_dss("TTU"),
                         utility_type_chr = "AQoL-6D",
                         mdl_predrs_in_ds_chr = c("GAD7 total score", "PHQ9 total score"))

mdls_lup[,c(1,2,5)] %>% 
  ready4show::print_table(caption_1L_chr = "Available models", output_type_1L_chr = "HTML")

Available models
mdl_nms_chr	predrs_ls	source_chr
PHQ9_1_GLM_GSN_LOG	PHQ9	Primary Analysis
PHQ9_1_OLS_CLL	PHQ9	Primary Analysis
GAD7_1_GLM_GSN_LOG	GAD7	Primary Analysis
GAD7_1_OLS_CLL	GAD7	Primary Analysis
PHQ9_SOFAS_1_GLM_GSN_LOG	PHQ9 , SOFAS	Primary Analysis
PHQ9_SOFAS_1_OLS_CLL	PHQ9 , SOFAS	Primary Analysis
GAD7_SOFAS_1_GLM_GSN_LOG	GAD7 , SOFAS	Primary Analysis
GAD7_SOFAS_1_OLS_CLL	GAD7 , SOFAS	Primary Analysis
OASIS_PHQ9_1_GLM_GSN_LOG	OASIS, PHQ9	Secondary Analysis B
OASIS_PHQ9_1_OLS_CLL	OASIS, PHQ9	Secondary Analysis B
GAD7_PHQ9_1_GLM_GSN_LOG	GAD7, PHQ9	Secondary Analysis B
GAD7_PHQ9_1_OLS_CLL	GAD7, PHQ9	Secondary Analysis B
SCARED_PHQ9_1_GLM_GSN_LOG	SCARED, PHQ9	Secondary Analysis B
SCARED_PHQ9_1_OLS_CLL	SCARED, PHQ9	Secondary Analysis B

We select our preferred model and retrieve summary data about the model’s predictor variables.

predictors_lup <- get_predictors_lup(mdls_lup = mdls_lup, mdl_nm_1L_chr = "GAD7_PHQ9_1_OLS_CLL")

exhibit(predictors_lup)

Variable	Description	Minimum	Maximum	Class	Increment	Function	Scaling	Covariate
GAD7	GAD7 total score	0	21	integer	1	youthvars::youthvars_gad7	0.01	FALSE
PHQ9	PHQ9 total score	0	27	integer	1	youthvars::youthvars_phq9	0.01	FALSE

Transform prediction dataset

To be used with the mapping models available to us, our prediction dataset needs to be in long format. We perform the necessary transformation.

data_tb <- transform_ds_to_long(data_tb, predictors_chr = c("gad7", "phq9"),
                             msrmnt_date_var_nm_1L_chr = "d_interview_date", round_var_nm_1L_chr = "When")
#> Joining with `by = join_by(case_id, fkClientID, When)`
#> Joining with `by = join_by(case_id, fkClientID, When)`

We drop records where we are missing data for either GAD7 or PHQ9 at either timepoint.

data_tb <- transform_ds_to_drop_msng(data_tb, predictors_chr = c("gad7", "phq9"), 
                                      uid_var_nm_1L_chr = "fkClientID")

We now predict AQoL-6D health utility for each case with complete data.

predn_ds_ls <- make_predn_metadata_ls(data_tb,
                                      id_var_nm_1L_chr = "fkClientID",
                                      msrmnt_date_var_nm_1L_chr = "d_interview_date",
                                      predr_vars_nms_chr = c(GAD7 = "gad7", PHQ9 = "phq9"),
                                      round_var_nm_1L_chr = "When",
                                      round_bl_val_1L_chr = "t1",
                                      utl_var_nm_1L_chr = "AQoL6D_HU",
                                      mdls_lup = mdls_lup,
                                      mdl_nm_1L_chr = "GAD7_PHQ9_1_OLS_CLL")
data_tb <- add_utl_predn(data_tb, new_data_is_1L_chr = "Predicted", predn_ds_ls = predn_ds_ls)
#> Joining with `by = join_by(fkClientID, When)`

Finally, we derive QALY predictions from the health utility measures at both time-points.

data_tb <- data_tb %>% add_qalys_to_ds(predn_ds_ls = predn_ds_ls, include_predrs_1L_lgl = F, reshape_1L_lgl = T)

data_tb %>% head() %>%
  ready4show::print_table(caption_1L_chr = "Final dataset", output_type_1L_chr = "HTML",
                          scroll_box_args_ls = list(width = "100%"))

Final dataset
fkClientID	d_interview_date_t1	d_interview_date_t2	gad7_t1	gad7_t2	phq9_t1	phq9_t2	AQoL6D_HU_t1	AQoL6D_HU_t2	AQoL6D_HU_change_dbl_t1	AQoL6D_HU_change_dbl_t2	duration_prd_t1	duration_prd_t2	qalys_dbl_t1	qalys_dbl_t2
Participant_10	2020-08-05	2020-11-07	15	13	17	18	0.3891806	0.6342526	0	0.2450720	0S	94d 0H 0M 0S	0	0.1316943
Participant_1000	2020-09-06	2020-12-20	13	10	13	10	0.6609298	0.2963083	0	-0.3646215	0S	105d 0H 0M 0S	0	0.1375907
Participant_1001	2020-07-05	2020-10-15	10	11	10	16	0.5324127	0.6192971	0	0.0868844	0S	102d 0H 0M 0S	0	0.1608137
Participant_1003	2020-05-18	2020-08-12	6	8	16	7	0.5630164	0.8584193	0	0.2954030	0S	86d 0H 0M 0S	0	0.1673422
Participant_1005	2020-05-09	2020-08-25	14	5	20	9	0.5090272	0.7799675	0	0.2709403	0S	108d 0H 0M 0S	0	0.1905701
Participant_1006	2020-05-29	2020-08-25	15	9	21	17	0.2969778	0.2734973	0	-0.0234805	0S	88d 0H 0M 0S	0	0.0687225