Validated mental health screeners — what HADS, PHQ-9, and GAD-7 actually measure

A validated screener is a short questionnaire with a published study behind it showing that, on average, it can tell people with a given condition apart from people without it at a known rate of error. That is a narrower claim than most app marketing makes, and a much narrower claim than the questionnaires' titles suggest. The "Generalized Anxiety Disorder 7-item scale" does not diagnose generalized anxiety disorder. Each is a screening instrument with a published sensitivity, specificity, and population it was tested in.

The three screeners that show up most often in clinical use, research, and apps are HADS, PHQ-9, and GAD-7. Colors includes the standard HADS under its Tests section, with the same cut-offs as the original 1983 paper.

What a validated screener is

A questionnaire becomes "validated" when researchers test it against a clinical reference standard (usually a structured diagnostic interview by a trained clinician) and report how often the two agree. Two numbers do most of the work. Sensitivity is the proportion of people with the condition the screener correctly flags. Specificity is the proportion of people without the condition it correctly leaves alone. There is always a trade-off between them at any given cut-off.

The screeners here were published in mainstream clinical journals: HADS in Acta Psychiatrica Scandinavica,¹ PHQ-9 in the Journal of General Internal Medicine,³ GAD-7 in Archives of Internal Medicine.⁴ The validation papers are public, the cut-offs are documented, and decades of follow-up studies exist. That is what separates them from the long tail of unvalidated quizzes online.

HADS: how it works and what it measures

The Hospital Anxiety and Depression Scale was designed by Zigmond and Snaith in 1983 to screen for anxiety and depression in non-psychiatric hospital outpatients.¹ The clinical problem they were solving: physical illness and its treatments produce somatic symptoms (poor sleep, low energy, weight change) that overlap heavily with depression on most depression questionnaires of the time. A general medical patient could score high on a depression screener simply because they were physically unwell.

HADS leaves the somatic items out. It has 14 questions (seven for anxiety, seven for depression), each rated 0 to 3, giving a 0–21 range per subscale. The original paper proposed three bands per subscale: 0–7 normal, 8–10 borderline, 11 or above clinical. Those cut-offs are still the ones in standard use.

Bjelland and colleagues' 2002 review in the Journal of Psychosomatic Research pulled together 747 studies that had used HADS in the prior two decades.² Across that body of work, the anxiety and depression subscales each showed sensitivity and specificity of roughly 0.80 against clinical interview at the standard cut-offs. Both subscales correlated well with other anxiety and depression measures, and the two-factor structure (anxiety vs depression) replicated across populations. That is a usable instrument: not perfect, but well-characterised.

PHQ-9 and GAD-7

The Patient Health Questionnaire 9-item depression module is built directly from the DSM-IV criteria for major depression. Each of nine items maps to one of the nine diagnostic criteria, scored 0 (not at all) to 3 (nearly every day) over the past two weeks. Total score 0–27. Kroenke, Spitzer, and Williams' 2001 validation study reported standard severity bands (0–4 minimal, 5–9 mild, 10–14 moderate, 15–19 moderately severe, 20–27 severe) and a recommended cut-off of 10 for further assessment.³

Mitchell and colleagues' 2016 meta-analysis pooled 40 primary-care studies of the PHQ-9 against clinical interview.⁵ At the cut-off of 10, sensitivity was around 0.80 and specificity around 0.85: broadly similar to HADS, in a different population, with a different question set. The PHQ-9 also includes a final item about suicidal thoughts, which is one reason clinicians often prefer it to a screener that doesn't.

GAD-7 was published by Spitzer, Kroenke, Williams, and Löwe in 2006 in Archives of Internal Medicine as a parallel anxiety instrument.⁴ Seven items, same 0–3 scale, total 0–21, with bands of 0–4 minimal, 5–9 mild, 10–14 moderate, 15–21 severe. The original validation study reported sensitivity 0.89 and specificity 0.82 at a cut-off of 10 against a structured interview for generalized anxiety disorder, and the questionnaire also performed reasonably well as a flag for panic, social anxiety, and PTSD. In other words, it picks up something anxious more reliably than it picks up GAD specifically.

What screeners are good for, what they aren't

A screener is useful as the start of a conversation. A score above the cut-off is a structured reason to bring the question to a GP or therapist, with a number attached, in a form that clinician will recognise immediately. It is also useful as a tracker: scores measured at the same cadence, over weeks or months, show whether things are getting better, worse, or holding flat.

A screener is not useful as a self-diagnosis. The names invite that misreading; the validation papers explicitly do not support it. A diagnosis requires a clinician taking a history, ruling out medical causes, and assessing the pattern over time. The 0.80 specificity figure already implies as much. At population scale, one in five people who score above the cut-off do not have the condition.

A screener is also not useful in active crisis. PHQ-9 item 9 asks about thoughts of self-harm, but a questionnaire is not a substitute for immediate help. If the answer to that item is anything other than "not at all", the right next step is talking to a person, not retaking the test.

Pattern beats single score

A single PHQ-9 score taken on a bad afternoon is mostly noise. The questions ask about the past two weeks, but the act of completing the questionnaire is shaped by mood at the moment of completion, by recall bias, by what just happened that morning. The trend across several administrations carries far more information than any single result.

Most CBT therapists practising measurement-based care administer PHQ-9 and GAD-7 every two to four weeks during active treatment, and use the trajectory, not the absolute score, to decide whether the work is moving. The same logic applies to self-administered tracking. A score in the borderline range that has been stable for six months is different information from a score in the borderline range that has been climbing for six weeks, and only one of them is an alarm.

How Colors uses HADS

Colors includes the standard HADS questionnaire under the Tests section, with Normal / Borderline / Clinical bands matching Zigmond and Snaith's original cut-offs.¹ The intent is the same as the one the questionnaire was designed for: a structured way to check in periodically, especially when something feels off but it isn't clear whether it's situational or persisting.

A single HADS result in Colors is not a diagnosis. It is information you can take to a GP or therapist if the score is in the borderline or clinical range, or repeat in two to four weeks if you want to see whether the pattern is stable. The broader case for self-monitoring as part of a structured intervention, and where the evidence for it actually sits, is in the mood tracking research review.

Frequently asked questions

What is a validated mental health screener?

A validated screener is a short questionnaire with a published study showing it detects a condition reasonably well in a defined population. The Hospital Anxiety and Depression Scale (HADS) was published by Zigmond and Snaith in 1983; the PHQ-9 by Kroenke and colleagues in 2001; the GAD-7 by Spitzer and colleagues in 2006. Each was tested against a clinical reference standard, and each has a known sensitivity and specificity. A screener is not a diagnosis — it is a structured way to flag whether something deserves a closer clinical look.

What does a high HADS score mean?

Zigmond and Snaith's original 1983 cut-offs split each subscale into 0–7 (normal), 8–10 (borderline), and 11+ (clinical). Bjelland's 2002 review of 747 studies found sensitivity and specificity around 0.80 for both the anxiety and depression subscales at the standard cut-offs. A score in the borderline or clinical range means the symptoms are worth a clinical conversation, not that a diagnosis is confirmed.

Are PHQ-9 and GAD-7 better than HADS?

They measure overlapping but slightly different things. HADS was designed for non-psychiatric hospital outpatients and deliberately leaves out somatic items that overlap with physical illness. PHQ-9 maps directly onto DSM depression criteria. GAD-7 was designed for generalized anxiety in primary care. None is universally better — each has the population it was validated in. Mitchell's 2016 meta-analysis of 40 primary-care studies found PHQ-9 sensitivity around 0.80 and specificity around 0.85 at the standard cut-off of 10.

Can a screener diagnose me?

No. The names are misleading — the GAD-7 is the "Generalized Anxiety Disorder 7-item scale", which sounds diagnostic, but the original Spitzer 2006 paper is explicit that it is a screening and severity measure, not a diagnostic test. A high score means consider a clinical assessment. A clinician interview, history, and ruling out other causes are what produce a diagnosis.

How often should I retake a screener?

Most measurement-based CBT care uses PHQ-9 and GAD-7 every 2–4 weeks during active treatment to track change. For self-monitoring outside of therapy, a similar interval is reasonable. Daily retakes add noise without much signal, since the questions ask about the past two weeks. A single score is rarely informative on its own; the trend across several administrations is.

Not medical advice

This article is for informational and educational purposes only. It does not constitute medical advice and should not replace consultation with a licensed mental health professional. If you are in crisis, please contact emergency services in your country immediately.

Crisis lines: US — 988 Suicide & Crisis Lifeline · UK / Ireland — Samaritans 116 123 · EU — Befrienders Worldwide

Last reviewed: May 2026.

References

Zigmond, A. S., & Snaith, R. P. (1983). The Hospital Anxiety and Depression Scale. Acta Psychiatrica Scandinavica, 67(6), 361–370. doi:10.1111/j.1600-0447.1983.tb09716.x
Bjelland, I., Dahl, A. A., Haug, T. T., & Neckelmann, D. (2002). The validity of the Hospital Anxiety and Depression Scale: An updated literature review. Journal of Psychosomatic Research, 52(2), 69–77. doi:10.1016/S0022-3999(01)00296-3
Kroenke, K., Spitzer, R. L., & Williams, J. B. (2001). The PHQ-9: Validity of a brief depression severity measure. Journal of General Internal Medicine, 16(9), 606–613. doi:10.1046/j.1525-1497.2001.016009606.x
Spitzer, R. L., Kroenke, K., Williams, J. B., & Löwe, B. (2006). A brief measure for assessing generalized anxiety disorder: the GAD-7. Archives of Internal Medicine, 166(10), 1092–1097. doi:10.1001/archinte.166.10.1092
Mitchell, A. J., Yadegarfar, M., Gill, J., & Stubbs, B. (2016). Case finding and screening clinical utility of the Patient Health Questionnaire (PHQ-9 and PHQ-2) for depression in primary care: a diagnostic meta-analysis of 40 studies. BJPsych Open, 2(2), 127–138. doi:10.1192/bjpo.bp.115.001685