Can an assessment be reliable but not valid?

Yes. A test can consistently measure the wrong thing. Reliability is necessary but not sufficient for validity.

Reliability vs Validity in Psychological Assessment: Key Differences

Q: What reliability level is acceptable?

Cronbach's alpha ≥ 0.70 for research, ≥ 0.80 for applied use. Below 0.70, measurement error substantially undermines confidence in individual scores.

Quick Answer

Reliability refers to the consistency of an assessment — whether it produces stable results across time and conditions. Validity refers to whether an assessment actually measures what it claims to measure. An assessment can be reliable without being valid (consistently measuring the wrong thing), but cannot be valid without being reliable. Both are essential quality criteria for any psychological tool.

What Is Reliability?

Reliability is the consistency and stability of a measurement instrument. A reliable assessment produces similar results under similar conditions — if you take it today and again in two weeks, the scores should be comparable (assuming nothing meaningful has changed about you in the interim).

Psychologists assess reliability in several ways. Internal consistency (measured by Cronbach’s alpha) captures whether the items within a scale all measure the same underlying construct — typically, alpha ≥ 0.70 is considered acceptable for research, ≥ 0.80 for applied use. Test-retest reliability measures stability over time — scores correlated across two administrations weeks apart. Inter-rater reliability (for assessments requiring human judgment) measures consistency across different raters.

A reliable assessment is like a precise measuring tape — it gives consistent readings. But a consistent measuring tape calibrated wrong still gives wrong measurements.

What Is Validity?

Validity is whether an assessment actually measures what it claims to measure — the most fundamental quality criterion in psychological assessment. An assessment can be perfectly consistent (reliable) while consistently measuring something other than its intended construct (invalid).

Validity has multiple dimensions. Construct validity is whether the test measures the theoretical construct it purports to (e.g., does an emotional intelligence test actually measure emotional intelligence?). Content validity is whether the items adequately cover the domain being measured. Criterion validity is whether scores predict relevant outcomes — concurrent validity (correlates with related measures now) and predictive validity (predicts future outcomes). Discriminant validity is whether the test does not correlate too highly with constructs it should be distinguishable from.

Key Differences

Dimension	Reliability	Validity
Core question	Is it consistent?	Does it measure what it claims?
Key types	Internal consistency, test-retest, inter-rater	Construct, content, criterion, discriminant
Relationship	Necessary but not sufficient for validity	Requires reliability as a prerequisite
Can exist without other?	Yes — reliable but invalid tests exist	No — valid tests must also be reliable
Measured by	Cronbach’s alpha, correlation coefficients	Factor analysis, correlation with criteria, expert review

A Classic Illustration

Imagine an archer shooting at a target. Reliable but not valid: all arrows cluster tightly in the top-left corner — consistent but not hitting the center. Valid but not reliable: arrows scattered randomly around the center — average is correct but no consistency. Both reliable and valid: arrows cluster tightly in the center — consistent and accurate. Good psychological assessment aims for this third scenario.

Our Methodology — How we evaluate reliability and validity in our assessments
Assessment Standards — Quality criteria including reliability thresholds
How Results Are Calculated — Transparent scoring and interpretation

Frequently Asked Questions

Why do popular tests sometimes have low reliability?

Popularity and scientific quality are different. MBTI, for example, is widely used but has documented test-retest reliability issues — a significant portion of people receive a different type when retested weeks later. This reflects inconsistency in measurement rather than genuine personality change.

What reliability level is acceptable?

For research purposes, Cronbach’s alpha ≥ 0.70 is generally acceptable. For applied, high-stakes uses (employment decisions, clinical assessment), ≥ 0.80–0.90 is preferred. Below 0.70, measurement error is high enough to substantially undermine confidence in individual scores.

Do our assessments report reliability data?

Yes. We document the reliability evidence for assessments on our platform and cite the research behind each instrument. See our Methodology page for our reliability standards and validation process.

Reliability vs Validity: What Is the Difference in Assessment?

What Is Reliability?

What Is Validity?

Key Differences

A Classic Illustration

Related Pages

Frequently Asked Questions

Why do popular tests sometimes have low reliability?

What reliability level is acceptable?

Do our assessments report reliability data?

Grief vs Depression: What Is the Difference?

ADHD vs Depression: What Is the Difference?

Anxiety vs Depression: What Is the Difference?

IQ vs EQ: What Is the Difference Between Intelligence and Emotional Intelligence?

Grit vs Talent: Which Predicts Success?

Working Memory vs Long-Term Memory: What Is the Difference?

Reliability vs Validity: What Is the Difference in Assessment?

ADHD vs Executive Dysfunction: What Is the Difference?

Intelligence vs Knowledge: What Is the Difference?

Grief vs Depression: What Is the Difference?

Career Aptitude vs Career Interest: What Is the Difference?

Autism vs Introversion: What Is the Difference?

What Is Reliability?

What Is Validity?

Key Differences

A Classic Illustration

Related Pages

Frequently Asked Questions

Why do popular tests sometimes have low reliability?

What reliability level is acceptable?

Do our assessments report reliability data?

Similar Posts