Who developed the first intelligence test

Who Developed the First Intelligence Test? A Historical Guide to Modern Psychometrics

Introduction

The question of who developed the first intelligence test is central to understanding modern cognitive assessment. While intelligence as a concept has been discussed for centuries, the systematic measurement of cognitive ability began in the late 19th century with Alfred Binet and Theodore Simon. Their groundbreaking work in 1905 established the foundational framework that shaped every intelligence test developed since—from the Stanford-Binet to the Wechsler scales. Understanding this history is essential for anyone evaluating cognitive assessments today, as modern tests inherit their scientific architecture directly from Binet’s original methodology. This article traces the key figures, dates, and innovations that transformed intelligence testing from philosophical speculation into clinical science.

At a Glance: Quick Answer

QuestionAnswer
Who developed the first intelligence test?Alfred Binet and Theodore Simon created the Binet-Simon Scale in 1905 in France for the Ministry of Public Instruction. It consisted of 30 standardized tasks to identify children needing educational support.
When was it developed?1905 in Paris, France
What was its purpose?To identify children requiring special educational intervention using objective, standardized measurement
How did it work?Measured mental age by comparing a child’s cognitive performance to age-based norms—a child solving 8-year-old-level problems had a “mental age” of 8

The Birth of Intelligence Testing: Binet and Simon (1905)

Who Created the First Intelligence Test?

Alfred Binet (1857–1911), a French psychologist at the Sorbonne University, and Theodore Simon (1873–1961), a French physician, jointly developed the Binet-Simon Scale in 1905. This was the world’s first standardized intelligence test. Binet’s motivation was practical: the French Ministry of Public Instruction needed a method to identify children who required special educational support. Rather than relying on teacher intuition, Binet proposed a scientific measurement of mental ability.

The scale consisted of 30 tasks arranged in order of difficulty, measuring skills like memory, attention, reasoning, and verbal comprehension. Binet’s genius lay in his concept of “mental age”—the idea that a child’s cognitive performance could be compared to age-based norms. A child who solved problems typical of an 8-year-old was said to have a mental age of 8, regardless of chronological age.

Why Was the Binet-Simon Test Revolutionary?

The Binet-Simon Scale was revolutionary because it:

  • Introduced standardization: Tests were administered identically to all participants, with consistent scoring procedures—establishing the first psychometric standard.
  • Used normative data: Performance was compared to age-based reference groups, establishing the first psychological norms and normative comparison methodology.
  • Focused on practical outcomes: Rather than measuring abstract “general intelligence,” it predicted school performance—the actual problem Binet aimed to solve.
  • Established reliability: Binet extensively validated his test on large samples, demonstrating consistency across repeated administrations and test-retest stability.

Binet published revisions in 1908 and 1911, continuously refining the scale based on empirical evidence. This iterative approach became the standard methodology for test development. The conceptual framework Binet created—standardized tasks, age norms, practical validity—remains foundational to every intelligence test published today.

Learn more: Explore intelligence test scores to understand how modern assessments build on these foundational principles.

The American Adaptation: Lewis Terman and the Stanford-Binet (1916)

Who Adapted Binet’s Test for American Use?

Lewis Madison Terman (1877–1956), a psychologist at Stanford University in California, recognized the potential of Binet’s work for American educational systems. In 1916, Terman published the Stanford-Binet Intelligence Scale, an adaptation and extension of the original Binet-Simon test. Terman’s version introduced several critical innovations:

  • The IQ (Intelligence Quotient): Terman adopted the IQ concept, calculated as (Mental Age ÷ Chronological Age) × 100. This metric made scores more interpretable and comparable across ages.
  • Expanded item pool: The Stanford-Binet grew from 30 to over 90 items, improving measurement precision and domain coverage.
  • Larger normative sample: Terman tested over 1,000 American children across multiple states and socioeconomic backgrounds, establishing robust American norms separate from Binet’s Paris sample.
  • Broader age range: The test extended from age 2 to adulthood, covering the full developmental lifespan.

The Impact of Terman’s Work

Terman’s Stanford-Binet became the dominant intelligence test in the United States for decades. His work also sparked the widespread use of intelligence testing in schools, military selection (during World War I), and clinical psychology. The American Psychological Association formally endorsed standardized testing methodologies, making Terman’s approach the gold standard.

However, Terman’s testing was not without controversy—his test was sometimes used to justify eugenic policies, a historical fact that modern psychometrics explicitly rejects. This represents a critical juncture in the history of intelligence assessment: the recognition that tools designed for educational identification were misappropriated for harmful purposes.

Modern relevance: The Stanford-Binet underwent major revisions in 2003 (5th edition, published by Houghton Mifflin Harcourt) and now incorporates contemporary cognitive theory, including verbal reasoning, nonverbal reasoning, quantitative reasoning, and visual-spatial processing—eliminating the single-score bias that made historical misuse possible.

Take the test: Assess your IQ using modern adaptations with our Stanford-Binet Intelligence Test.

Expanding the Field: The Wechsler Scales and Beyond

Who Created Alternative Intelligence Tests?

David Wechsler (1896–1981), a clinical psychologist who spent much of his career at Bellevue Hospital in New York, developed the Wechsler Adult Intelligence Scale (WAIS) in 1939, later followed by the Wechsler Intelligence Scale for Children (WISC) in 1949. Wechsler’s approach differed significantly from Binet’s:

  • Focused on adults: While Binet’s test emphasized children in educational settings, Wechsler created assessments specifically designed for clinical diagnosis in adults and children.
  • Separated verbal and performance IQ: The WAIS divided intelligence into Verbal IQ (language-based reasoning, vocabulary, arithmetic) and Performance IQ (nonverbal, visual-spatial problem-solving, pattern completion). This dual-score approach revealed that intelligence is multifaceted.
  • Broader conceptualization: Wechsler explicitly defined intelligence as “the aggregate or global capacity of the individual to act purposefully, to think rationally and to deal effectively with his environment.”

Wechsler’s framework directly influenced the modern understanding of multiple intelligences, which recognizes that cognitive abilities extend across distinct domains rather than concentrating in a single general factor.

Comparative Timeline of Major Intelligence Tests

Test NameYearDeveloper(s)OrganizationPrimary Use
Binet-Simon Scale1905Alfred Binet & Theodore SimonFrench Ministry of Public InstructionEducational placement (children)
Stanford-Binet1916Lewis TermanStanford UniversityEducational & clinical assessment (all ages)
WAIS1939David WechslerBellevue HospitalClinical diagnosis (adults)
WISC1949David WechslerPsychological CorporationClinical & educational (children)
Cognitive Assessment System (CAS)1997Jack Naglieri & Jagannath P. DasUniversity of DelawarePlanning, attention, simultaneous, successive processing

Historical Milestones in Intelligence Test Development

Key Innovations After Wechsler

The decades following Wechsler’s pioneering work saw several transformative developments:

1960s–1980s: Cognitive Theory Integration Psychologists began embedding cognitive theory directly into test design. Rather than simply measuring scores, modern tests mapped onto theoretical frameworks from cognitive science. The Cognitive Assessment System (CAS), developed by Jack Naglieri at George Mason University and based on the PASS model (Planning, Attention, Simultaneous processing, Successive processing), exemplified this shift. The PASS model parallels contemporary understanding of processing speed and working memory as distinct cognitive capacities.

1990s–2000s: Factor Analysis and Structural Refinement Advances in statistical methods allowed researchers to identify the underlying structure of intelligence more precisely. The Cattell-Horn-Carroll (CHC) theory, developed collaboratively by psychometricians affiliated with various universities, became the dominant theoretical framework. This period established the foundation for understanding abilities like verbal intelligence, numerical intelligence, and abstract reasoning as measurable, distinct constructs.

The American Educational Research Association, American Psychological Association, and National Council on Measurement in Education jointly published updated Standards for Educational and Psychological Testing, formalizing the scientific requirements for modern assessments.

2000s–Present: Digital and Culturally Responsive Assessments Contemporary tests increasingly:

  • Use digital administration and scoring platforms
  • Account for cultural and linguistic diversity in task design and interpretation
  • Measure processing speed and working memory as distinct cognitive domains (not single-score aggregates)
  • Incorporate ecological validity—predicting real-world performance, not just test scores

Explore: Discover how modern tests measure distinct cognitive abilities in our cognitive domains overview.

Alfred Binet vs. Lewis Terman: Philosophical and Methodological Differences

Deep-Dive Comparison: Core Contrasts

While both Binet and Terman shaped modern intelligence testing, their approaches differed meaningfully—differences still evident in contemporary test design:

AspectBinet (1905)Terman (1916)
PurposeIdentify children needing educational supportMeasure general cognitive ability across populations for ranking/screening
Conceptual basisPractical problem-solving in school contextUnderlying “g” (general intelligence factor)
Normative sample size~200 Paris schoolchildren1,000+ American children across states
Sample compositionUrban French schoolchildrenDiverse U.S. socioeconomic backgrounds
Score interpretationMental Age vs. Chronological Age (qualitative)IQ Ratio: (MA ÷ CA) × 100 (quantitative)
Formula usedNo standardized metric initiallyIQ = (Mental Age ÷ Chronological Age) × 100
Theoretical stanceIntelligence is multifaceted, context-dependentIntelligence is largely unitary, measurable by single score
Use trajectoryDiagnostic/clinical tool for identificationPopulation screening, ranking, and classification
Intended usersEducators and school psychologistsMilitary, industry, population geneticists

Binet’s original intent was identification of educational need, while Terman’s adaptation enabled population comparison and ranking. This distinction remains critical today: modern intelligence tests serve both purposes, but clinicians must be clear about whether they’re using tests for diagnosis (Binet’s original intent) or ranking (Terman’s adaptation). This misalignment between design and application became a significant vulnerability for test misuse in later decades.

The Technical Evolution: Mental Age to Deviation IQ

The Shift from Mental Age to Deviation IQ

One of the most important technical developments in intelligence testing history involved how scores were calculated and interpreted. This transition is crucial for understanding why modern tests (WISC-V, SB5) are fundamentally more robust than historical instruments.

Mental Age Scoring (Binet-Simon, 1905–1930s) The original Binet-Simon method assigned children a mental age based on the most difficult items they successfully completed. A 10-year-old who passed all tasks typical of a 12-year-old was said to have a mental age of 12. Terman’s IQ ratio (Mental Age ÷ Chronological Age × 100) tried to standardize this comparison, but Mental Age scoring had a critical flaw: it became increasingly unreliable at older ages. An adult’s mental age could be calculated, but the concept lost practical meaning for predicting adult outcomes.

Deviation IQ Scoring (Wechsler, 1939–Present) David Wechsler introduced a methodological revolution: instead of assigning mental ages, he calculated deviation IQ by comparing an individual’s raw scores to the statistical distribution of their age group.

  • Deviation IQ formula: IQ = 100 + (15 × standard deviation units)
  • Advantage: Works equally well for children and adults
  • Statistical property: Maintains a mean of 100 and standard deviation of 15 across all ages
  • Reliability: Eliminates the age-related ceiling effects that plagued mental age scoring

This technical innovation—moving from qualitative mental age comparisons to standardized statistical deviation scoring—made intelligence testing scientifically rigorous and predictively valid across the entire lifespan. Modern tests like the WISC-V (published by Pearson, 2014) and Stanford-Binet 5 (published by Houghton Mifflin Harcourt, 2003) use deviation IQ exclusively, with carefully calibrated norms across diverse populations.

Learn more: Understand how intelligence quotient test definitions apply these scoring principles today.

Historical Controversy and Modern Ethical Revisions

The Misuse of Early Intelligence Tests: The Eugenics Movement

Intelligence testing’s history includes a painful chapter: the misappropriation of early tests for eugenic purposes. This historical misuse is essential context for understanding how modern psychometrics has fundamentally restructured to prevent recurrence.

Historical Misuse (1920s–1950s) Lewis Terman’s Stanford-Binet, while originally designed for educational identification, became a tool for population-level classification. In the 1920s–1950s, some psychologists and policymakers used IQ test results to justify eugenic policies—including forced sterilization programs in the United States and Canada, and more catastrophically, discriminatory racial theories that contributed to atrocities in Nazi Germany. The concept of “general intelligence” (the “g” factor that Terman favored) was misinterpreted as genetically fixed and racially determined—a fundamentally inaccurate reading of the science that nevertheless caused immense harm.

The Critical Error: These misapplications confused:

  • Educational prediction (what tests actually measure) with biological destiny (what some falsely claimed tests proved)
  • Group statistical differences (which reflect differential access to education and socioeconomic advantage) with group-level genetic differences (which the evidence does not support)

Modern Psychometrics’ Response: Ethical Reconstruction

Contemporary intelligence tests have been explicitly rebuilt to address these historical failures. The American Psychological Association, Pearson Assessment (the major test publisher), and academic psychometricians at leading universities have implemented structural changes:

Stanford-Binet, Fifth Edition (SB5, 2003)

  • Eliminated single-score emphasis: Modern versions provide multiple domain scores, preventing the reduction of intelligence to one number.
  • Culturally responsive norming: Norms established with diverse samples accounting for socioeconomic status, ethnicity, and linguistic background.
  • Bias reduction in items: Test items explicitly reviewed for cultural or linguistic unfairness; items that showed differential performance by demographic groups were removed or revised.
  • Ecological validity: Focus shifted to predicting real-world academic and occupational success, not claiming to measure “innate ability.”

Wechsler Intelligence Scale for Children, Fifth Edition (WISC-V, 2014)

  • Removed “General Ability Index”: Eliminated the single global score that historically enabled misuse.
  • Separate processing speed measurement: Processing speed is now measured independently (not conflated with reasoning), preventing speed-related biases from distorting cognitive ability assessment.
  • Diverse normative sample: Over 2,300 children from varied racial, ethnic, and socioeconomic backgrounds.
  • Clinical guidance on fairness: Test manuals explicitly instruct clinicians on avoiding biased interpretation and considering context in score meaning.

Key Principle: Modern tests are tools for identification and support, not classification or ranking. A high IQ score means “this individual has scored well on this particular test, suggesting they might benefit from advanced academic coursework”—not “this person is inherently superior.”

The Role of Professional Organizations

The American Psychological Association, through its Division 15 (Educational Psychology) and Division 5 (Evaluation, Measurement, and Statistics), actively monitors test usage and publishes guidance on ethical practice. The National Association of School Psychologists similarly emphasizes that intelligence tests must be used to identify needs (consistent with Binet’s original intent) rather than to rank or classify populations (the harmful application).

Understand context: Learn how theory of multiple intelligences represents the modern synthesis of intelligence research, moving beyond single-score models entirely.

The Science Behind the Tests: What Made Them Valid?

Standardization and Normative Data: The Foundation of Reliability

Both Binet and Terman understood that a test’s value depends on standardization—administering tests identically to large, representative samples. This principle remains non-negotiable in modern test design.

  • Binet’s normative sample: Approximately 200 children in Paris schools (representative of early 20th-century French urban education)
  • Terman’s normative sample: Over 1,000 American children across multiple states and socioeconomic backgrounds (first large-scale U.S. normative effort)
  • Modern test requirements: 1,500–3,000+ participants to establish robust norms across age, gender, race/ethnicity, socioeconomic status, and regional location

Modern tests require samples of 1,500–3,000+ participants to establish robust norms across age, gender, and demographic groups. Pearson (which publishes WISC-V), Houghton Mifflin Harcourt (which publishes Stanford-Binet 5), and other major publishers now conduct stratified sampling ensuring demographic representation proportional to U.S. Census data. Understanding how intelligence test scores are calculated depends entirely on normative data—a direct legacy of both Binet and Terman’s methodologies.

Reliability and Validity: The Pillars of Psychometric Science

Reliability (consistency) and validity (measuring what the test claims to measure) are the two foundational pillars of psychometric science. Both concepts originated with Binet’s work and have been mathematically formalized over a century.

  • Test-retest reliability: Binet demonstrated that scores remained stable when children were tested again months later. Modern standards require correlations of r > .90 (indicating 90% of score variance is consistent rather than random).
  • Concurrent validity: Terman showed that Stanford-Binet scores predicted school grades and teacher ratings of academic ability. Modern tests must demonstrate correlations with educational achievement, occupational success, and other intelligence-related outcomes.
  • Construct validity: Modern tests validate that they measure cognitive constructs (reasoning, memory, processing speed) that cognitive science identifies as real, independent phenomena.

All contemporary intelligence tests undergo rigorous validation before publication. The Standards for Educational and Psychological Testing (published jointly by the American Educational Research Association, American Psychological Association, and National Council on Measurement in Education) set the benchmark for acceptable reliability and validity. No test can be published in the U.S. without meeting these standards.

Modern Intelligence Testing: Built on Historical Foundations

How Binet’s Framework Persists Today

Despite a century of refinement, Binet’s core principles remain foundational:

  1. Standardized administration: Every test taker receives identical instructions and timing.
  2. Normative comparison: Scores are interpreted by comparing an individual’s performance to age-matched groups.
  3. Practical validity: Intelligence tests predict educationally and occupationally relevant outcomes.
  4. Iterative revision: Tests are regularly updated as cognitive theory advances and demographic composition shifts.

Contemporary Tests and Their Lineage

  • Stanford-Binet, Fifth Edition (2003): Published by Houghton Mifflin Harcourt. Direct descendant of Terman’s 1916 adaptation, now measuring five cognitive domains: fluid reasoning, knowledge, quantitative reasoning, visual-spatial processing, and working memory. Includes fluid intelligence and crystallized intelligence subtests directly derived from contemporary cognitive science.
  • WISC-V (2014): Published by Pearson. Modern evolution of Wechsler’s framework, measuring verbal comprehension, visual-spatial reasoning, fluid reasoning, working memory, and processing speed. Represents the most clinically-used test for children globally.
  • Cognitive Assessment System, Second Edition (CAS2, 2015): Developed at George Mason University. Represents integration of cognitive psychology into test design, measuring planning, attention, and simultaneous/successive processing based on the PASS model.

Each of these tests can trace its lineage directly back to Binet and Simon’s 1905 innovation. Assess your cognitive profile with our IQ test or adult IQ test.

Frequently Asked Questions

What Tests Are Similar to the Binet-Simon Scale?

Modern intelligence tests that share Binet’s foundational approach include:
Wechsler scales (WAIS, WISC): Emphasis on verbal and nonverbal reasoning, published by Pearson
Cognitive Assessment System (CAS): Emphasis on planning and attention, developed at George Mason University
Woodcock-Johnson Tests of Cognitive Ability: Emphasis on broad cognitive abilities based on CHC theory, published by Riverside Insights
Kaufman Assessment Battery for Children (KABC): Emphasis on sequential and simultaneous processing
All derive from Binet’s core principle: standardized assessment of cognitive ability compared to age norms. Explore different intelligence assessment approaches with our pattern recognition test, analytical intelligence test, and practical intelligence test.

Is the Stanford-Binet Still Used Today?

Yes. The Stanford-Binet Intelligence Scales, Fifth Edition (SB5, 2003), published by Houghton Mifflin Harcourt, is widely used in educational and clinical settings worldwide. It measures five cognitive domains: fluid reasoning, knowledge, quantitative reasoning, visual-spatial processing, and working memory. Modern revisions incorporate contemporary cognitive psychology while maintaining continuity with Terman’s original framework. The SB5 is one of the most commonly administered individually-administered intelligence tests for children and adults in clinical psychology, school psychology, and educational assessment.

What Is the Difference Between the Binet-Simon Scale and the Stanford-Binet?

The Binet-Simon Scale (1905) was the original French test with 30 items and mental age scoring. The Stanford-Binet (1916) was Lewis Terman’s American adaptation, which:
Extended the test to cover a broader age range (age 2 to adulthood)
Introduced the IQ metric (Mental Age ÷ Chronological Age × 100)
Expanded the item pool from 30 to 90+ items
Established American normative data with over 1,000 children
Positioned intelligence testing as a tool for educational and clinical decision-making
While the Binet-Simon is historically significant, the Stanford-Binet became the practical standard in the United States and internationally.

Who Invented the IQ Score?

Lewis Terman popularized the IQ (Intelligence Quotient) score in 1916 with the Stanford-Binet test. However, the concept originated with Wilhelm Stern, a German psychologist who proposed the formula (Mental Age ÷ Chronological Age) × 100 in 1912. Terman adopted and standardized this metric for widespread use in America, making it the global standard for score reporting. Today, understanding your intelligence quotient requires understanding this foundational metric that Terman established over a century ago.

When Was the First Intelligence Test Developed?

The first intelligence test was developed in 1905 by Alfred Binet and Theodore Simon in France. The Binet-Simon Scale consisted of 30 tasks designed to identify children who needed special educational intervention. It was commissioned by the French Ministry of Public Instruction and published in the journal Annee Psychologique (The Psychological Year).

Expert Perspective: Why History Matters for Modern Assessment

Clinical and educational psychologists rely on understanding intelligence test history for three critical reasons:

  1. Interpreting score validity: Understanding that Stanford-Binet scores were originally designed to predict school performance helps clinicians avoid over-interpreting results as measures of “innate ability” or “genetic potential.”
  2. Recognizing theoretical evolution: Modern tests explicitly incorporate cognitive science (working memory, processing speed, reasoning subtypes). Clinicians must understand that a contemporary WISC-V profile is fundamentally more sophisticated—and more ethically defensible—than Terman’s global IQ score.
  3. Avoiding historical misuse: Intelligence testing has a complicated history involving eugenic applications and demographic bias. Modern practice requires awareness of how tests were misused historically to prevent contemporary misapplication. The American Psychological Association and National Association of School Psychologists explicitly train clinicians in this history as part of professional competency.

The field of psychometrics has moved decisively toward multifactorial models (recognizing multiple cognitive domains) and culturally responsive assessment (accounting for language, socioeconomic background, and test-taking experience). These advances rest directly on the empirical foundations laid by Binet, Terman, and Wechsler while explicitly rejecting the historical misuses that plagued earlier applications.

Understanding the theory of multiple intelligences represents the modern synthesis of this historical progression toward recognizing intelligence’s inherent complexity and multifaceted nature. Contemporary intelligence assessment is not a single number—it’s a nuanced profile of cognitive strengths and developmental areas, designed to support, not classify.

Ready to Assess Your Cognitive Abilities

Our comprehensive intelligence tests are based directly on the scientific principles established by Binet (1905), refined by Terman (1916) and Wechsler (1939), and continuously updated to reflect contemporary cognitive science and ethical best practices. Discover your strengths across multiple cognitive domains—a direct descendant of the pioneering work that began in Paris over 120 years ago.

Similar Posts