Procedure
Schedule of data collection
The recruitment of patients was started in 1994. The first treatments were started in 1995 and the last in 2000. A 10-year follow-up since the beginning of the treatments has been completed.
Ways of data collection
Patients were initially assessed before being randomly assigned to the different treatments. After this, the assessments were repeated during and after treatment, a total of 14 times during the follow-up. Information was collected using questionnaires, videotaped interviews, psychological tests, and laboratory measurements.
Additional information regarding health and service utilization was annually collected from nationwide population registers from the start of the study to the end of the follow-up. Register data consisted of prescription medication, periods of hospital treatment, use of rehabilitation services, causes of death, and income taxes.
Information on therapists was collected at baseline and on therapy process at each measurement period during the entire study treatment.
Extent of measurements
The assessments were completed at the baseline examination and during the follow-up after 3, 7, and 9 months and 1, 1.5, 2, 3, 4, 5, 6, 7, 8, 9 and 10 years. Questionnaires were administered on almost all of these occasions and at the time of termination of the study treatment. The questionnaires administered after 3 and 9 months, and 1.5, 2, 4, 6, and 10 years were brief.
Interviews were completed at baseline and repeated twice during the first year, i.e. at 7 months and 1 year, and were thereafter carried out every second year until the 7 year follow-up. Psychological tests were carried out at baseline and repeated after 3 and 5 years of follow-up. Register data covered all of the measurement occasions.
Implementation of measurements
The decision on eligibility was made by a psychiatrist on the basis of a screening questionnaire and interview. Eligible patients completed an extensive set of baseline questionnaires, attended interviews, and participated in psychological and laboratory tests.
Randomization and assignment to therapy were carried out after three baseline interview sessions. Patients assigned to psychoanalysis were subject to a similar assessment procedure, except for self-selection to the specific treatment and analyst. The follow-up schedule was started at the beginning of the therapy.
Statistical analysis
Effectiveness study
The effectiveness of the four therapies was compared in the "intention-to-treat" (ITT) sample giving the clinical effect of the treatment policy. The data contained repeated measurements of the outcome variables. The primary analyses were based on the assumption of ignorable dropouts. In secondary analyses, missing values were replaced by multiple imputation (Rubin 1987). In the case of continuous outcome variables, the statistical analyses were based on linear mixed models (Verbeke and Molenberghs 1997), and in the case of binary outcomes on logistic regression models and generalized estimating equations (GEE, Liang & Zeger 1986).
Model-adjusted statistics using predictive margins were calculated for different design points (Lee 1981, Graubard and Korn 1999). For continuous outcomes, absolute means and their differences, and for binary outcomes, prevalences and relative risks/odds ratios, were estimated. The delta method was applied to calculate confidence intervals (Migon and Gamerman 1999). Statistical significance was tested with the Wald test.
Cost-effectiveness analyses were performed using the incremental cost-effectiveness ratio (ICER), which is the difference in the mean costs of the two therapies divided by the difference of their mean effectiveness (Drummond et al. 2005). The effectiveness of the therapies was estimated by calculating the area under the curve (AUC), which is the mean value of the outcome variable during the follow-up period (Pruessner et al. 2003). The confidence intervals for the ICERs were estimated using bootstrap methods. The logarithmic costs were modeled by a linear regression model with the treatment group as the independent variable. Multidimensional sensitivity analyses were performed.
Efficacy approximation
Proxy estimation of efficacy was carried out using "as-treated" (AT) analyses, taking into account compliance and use of auxiliary treatment. Also, Bayesian inference (Gelman et al. 1995) and dynamical models (Eerola et al. 2003, Commenges and Gégout-Petit 2009), which account for the dynamic interdependency between the outcome and auxiliary treatment processes during the follow-up, were applied.
Sufficiency study
The prevalence and incidence of auxiliary treatments were considered indicators of sufficiency of the therapies provided and were used as outcome variables in the effectiveness studies related to sufficiency. The comparison of prevalences of auxiliary treatment was carried out using the same methods as in the effectiveness study. The incidence of auxiliary treatment was modeled using Cox's regression (Cox 1972).
Suitability study
The possible differential prediction of outcome of psychotherapies of different type or length based on certain patient- , therapist-, or therapy-related factors, measured at baseline or during the therapy process, was studied using the same methods as in the effectiveness study, by using an interaction between the therapy group and the factor of interest as a predictor. Since the patients were not randomized with respect to these factors, potential confounding factors need to be adjusted for in the models.
The results can be compared with evidence from the current literature using meta-analysis, in which measures of the strength of the association, such as correlation coefficient, relative risk and odds ratio, between the predictor and the outcome are pooled using random effects models (DerSimonian and Laird 1986, Knekt et al. 2004). When prediction based on patient-, therapist- or therapy-related factors has been comprehensively studied, the relative importance of these factors can be compared using a Population Attributable Fraction (PAF) measure, which assesses the proportion of the psychotherapy outcome attributable to different factors (Laaksonen et al. 2010).
Quality control
In the analysis of quality-control data, the strength of agreement between measurements and the repeatability of measurements were estimated as intraclass correlation coefficients using the kappa coefficient in the case of categorical data (Fleiss 1981) and the reliability coefficient in the case of continuous data (Winer 1971).
Program packages
The main statistical analyses were carried out using procedures MIXED, GENMOD and PHREG of the SAS/STAT software, and procedure IML of the SAS/IML software (SAS Institute Inc. 2004). The Bayesian inference was conducted using the WinBugs (Lunn et al. 2000) and OpenBugs (Thomas et al. 2006) software packages.
Quality control
The quality of the interview data was continuously controlled and evaluated in several separate designs (Knekt and Lindfors 2004). The two primary focuses of the quality-control designs were the evaluation of consistency of the assessments and methodological research, i.e. the evaluation of applicability, comparability, reliability and validity of the methods used and of the new measures developed in HPS. Agreement between raters and long-term stability of the ratings were evaluated in a sample of 39 video-recorded interviews, rated independently by 5 psychologists and 2 psychiatrists at two time points (baseline and 3-year follow-up).
Methodological quality-control research comprised several sub-studies and focused on
- determining agreement between self-reported and interview-assessed psychiatric symptoms
- comparing diagnoses based on semi-structured diagnostic interviews (Knekt and Lindfors 2004) and Structured Clinical Interviews for DSM-IV axis I and axis II disorders (SCID) (First et al. 1995, 1997)
- comparing different methods for computing overall indices of symptoms and functional capacity
- assessing quality of proxy outcome assessments (PSQ, Table 1)
- evaluating reliability between self-rated and register-based information for the use of psychotropic medication, and
- assessing symptomatic improvement during waiting-time for therapy (Holi et al. 2003).