Testing Creativity
A deep dive into how creativity is tested, why it's hard to measure, and how six major assessments capture different dimensions of creative thinking and potential.
Introduction
In a world increasingly driven by innovation, disruption, and rapid change, creativity has emerged as one of the most critical skills of the 21st century. It fuels everything from scientific breakthroughs to technological innovation, artistic expression, and problem-solving in everyday life. As such, schools, companies, and policymakers have become more interested in how to nurture, evaluate, and reward creativity — not just as a vague trait, but as a measurable capacity. Yet, while we test for literacy, numeracy, and logic with standardized rigor, creativity remains one of the most undervalued and misunderstood dimensions in traditional assessments.
The growing emphasis on creativity in education and the workplace raises a vital question: how do we measure something as dynamic and multifaceted as creativity? Unlike math or vocabulary, creativity doesn't have a single right answer. It is inherently open-ended, often subjective, and highly context-dependent. What is considered original or useful in one culture, domain, or moment might not be viewed the same elsewhere. This makes creativity not just difficult to define — but even harder to quantify with fairness and consistency.
Compounding this challenge is the fact that creativity isn't one thing — it's many things. It includes divergent thinking, originality, visual imagination, emotional expressiveness, practical problem-solving, and even personality traits like openness and risk tolerance. Some people may excel in generating wild ideas; others in refining them into useful innovations. A child’s creative potential might not show up in test scores but could be obvious in how they build, draw, or tell stories. Traditional standardized tests, which reward rule-following and convergence, often leave these dimensions untouched — or worse, penalize them.
Over the past several decades, psychologists and educators have developed a variety of tools to capture different facets of creativity, ranging from verbal fluency tests to abstract drawing tasks and preference scales. Each test reflects a different theory of what creativity is and how it should be recognized. Some are designed for speed and scalability; others for nuance and depth. Yet even the best assessments come with trade-offs — in what they measure, how fairly they do it, and how useful their results are in real-world settings.
This article explores some of the most influential and widely used tests of creativity, from classics like the Torrance Tests to newer models like CANU. By breaking down what each test measures, how well it does so, and what it misses, we can better understand not only how to assess creativity — but how to respect its complexity. In doing so, we move closer to a world where creative thinking is not only celebrated but also seen, supported, and meaningfully cultivated.
Core Dimensions of Creativity That Can Be Tested
Below is a breakdown of the main dimensions (or sub-skills) of creativity that psychologists and educators aim to test. These are drawn from foundational theories (e.g., Guilford, Torrance, Amabile, Sternberg) and modern research.
Summary of Major Creativity Tests
🧪 Test Coverage Matrix: How Well Each Test Covers Each Dimension
The following table shows how extensively each creativity test covers these dimensions. It uses a 3-level scale:
🟢 Strong Coverage
🟡 Moderate Coverage
🔴 Little or No Coverage
Insights from the Coverage Matrix
Most Commonly Measured Skills:
Fluency, originality, and divergent thinking are the core of almost all creativity tests.
These dimensions form the backbone of traditional assessments like TTCT, WKCT, and AUT.
Underserved but Critical Dimensions:
Usefulness, convergent thinking, and problem sensitivity are often ignored.
The CANU model and modern tests like MONDALK aim to fill these gaps.
Visual and Emotional Creativity:
Only TCT-DP and MONDALK adequately assess visual creativity and emotional expressiveness.
TTCT touches on these, but not deeply.
Creative Personality vs. Performance:
Only BWAS explicitly measures creative preference/personality, not performance.
It complements but cannot replace idea-generation tests.
The Tests in Detail
Torrance Tests of Creative Thinking (TTCT): The Gold Standard in Creativity Assessment
The Torrance Tests of Creative Thinking (TTCT) are widely regarded as the most extensively researched and globally accepted standardized measures of creativity. Developed in the 1960s by psychologist Ellis Paul Torrance, the TTCT was originally designed to identify creative potential in schoolchildren but has since evolved into a leading tool for assessing creative thinking across age groups and cultures.
✅ Global Recognition and Enduring Influence
With translations in over 35 languages and usage in more than 50 countries, the TTCT has achieved unparalleled global reach. Its influence extends to education systems, gifted and talented programs, psychological research, and even corporate creativity assessments. Decades of longitudinal studies affirm the TTCT's predictive validity, showing that it can forecast future creative achievement more accurately than traditional intelligence (IQ) tests (Kim, 2011).
🎯 What the TTCT Measures
The TTCT evaluates divergent thinking—the ability to generate multiple, varied, and original ideas in response to open-ended tasks. Specifically, it measures:
Fluency: The number of relevant ideas generated.
Originality: The statistical rarity or uniqueness of ideas.
Elaboration: The level of detail and development in each response.
Flexibility: The variety of conceptual categories in the ideas.
Resistance to Premature Closure: The ability to remain open and curious instead of settling quickly on obvious solutions.
🧪 Structure and Administration
There are two main versions of the TTCT:
Verbal TTCT: Participants are asked to respond to prompts using written or spoken language. Example: “What would happen if people could become invisible at will? List as many consequences as possible.”
Figural TTCT: Participants are given abstract shapes or incomplete images and asked to complete and title their drawings. The goal is not artistic skill but creative transformation and storytelling.
Scoring is conducted using a comprehensive rubric and can be both quantitative (e.g., number of ideas) and qualitative (e.g., emotional expressiveness, humor, or fantasy).
📝 Sample Tasks
Verbal: “List all the problems that might arise if there were no gravity.”
Figural: “Here is an incomplete shape. Turn it into a complete picture and give it a title.”
These tasks are designed to be open-ended, allowing multiple valid responses and encouraging imaginative thinking.
❗ Critiques and Limitations
Despite its popularity, the TTCT is not without criticism:
Cultural Bias: Scoring for originality is based on statistical norms, which may not transfer well across cultural contexts.
Overemphasis on Divergent Thinking: Creativity involves more than idea generation—it includes problem-solving, insight, and domain-specific skills.
Subjectivity in Scoring: While rubrics exist, some scoring elements (like emotional expressiveness) can be inconsistent across raters.
Limited Real-World Validity: Critics argue that TTCT scores don’t always reflect real-world creative accomplishment or innovation.
🛠️ How the TTCT Addresses These Critiques
The test has adapted over time to address many of its limitations:
Updated Norms: The TTCT periodically refreshes its scoring norms to reflect current cultural and generational trends.
International Calibration: Local adaptations and trained scorers help reduce cultural bias in diverse settings.
Broadened Scoring Rubric: The inclusion of storytelling, emotional expressiveness, and humor in figural tasks expands the creativity construct.
Training for Reliability: Scorers undergo standardized training to increase inter-rater reliability and reduce subjective bias.
📌 Why the TTCT Still Matters
The Torrance Tests of Creative Thinking remain the benchmark tool for assessing creative potential in both research and practice. Its flexibility, empirical support, and adaptability make it especially useful for identifying creativity in education, guiding gifted programs, and inspiring new approaches to measuring innovation.
Wallach–Kogan Creativity Tests (WKCT): The Classic Model of Divergent Thinking
Developed by Michael Wallach and Nathan Kogan in the 1960s, the Wallach–Kogan Creativity Tests (WKCT) are among the earliest and most influential tools for measuring creativity through divergent thinking. Unlike more rigid, time-pressured tests, the WKCT introduced a “game-like” testing format, which has become a foundational principle in creativity assessment.
✅ Status and Acceptance
The WKCT is widely recognized in psychological and educational research as a foundational creativity test, particularly for children and young adults. Its focus on idea fluency, flexibility, and originality has made it a staple in studies of giftedness and creative potential. Although it is not as widely used as the TTCT in contemporary education, it remains highly respected in research contexts and inspired many later tests.
🎯 What It Measures
The WKCT is designed to assess divergent thinking by presenting open-ended problems and measuring the ability to produce a wide range of novel ideas. It focuses on three key dimensions:
Fluency – the number of responses given.
Originality – how rare or unconventional the responses are.
Flexibility – the number of different categories or themes represented.
Unlike some other tests, WKCT does not score for elaboration (detail) and focuses more on quantity and variety than complexity.
🧪 Format and Administration
The WKCT is verbal and non-verbal, and typically includes the following five tasks:
Instances Task – "Name all the round things you can think of."
Alternate Uses Task – "List all the things you can do with a brick."
Similarities Task – "In what ways are a cat and a refrigerator alike?"
Pattern Meanings Task – Provide a title or story for abstract visual patterns.
Line Meanings Task – Create drawings or interpretations from random lines.
Tests are untimed or loosely timed and are administered in low-pressure environments to encourage spontaneity and creative risk-taking.
📝 Example Questions
"List all the things you can do with a paperclip."
"Name as many soft things as you can."
"In what ways are a chair and a sandwich similar?"
These types of questions are designed to free the mind from conventional thinking and encourage playful exploration of ideas.
❗ Critiques and Limitations
Limited Scoring Dimensions: The WKCT does not assess elaboration or emotional depth, unlike newer creativity tests.
Cultural Bias: As with most standardized creativity assessments, responses judged as "original" may be influenced by cultural norms.
Outdated Norms: The test was developed in the 1960s, and unless locally adapted, some prompts may feel dated or contextually narrow.
Scoring Subjectivity: Originality scoring can be ambiguous, especially without up-to-date normative data.
🛠️ How It Addresses These Issues
Flexible Administration: The untimed, low-stakes format helps reduce anxiety and fosters more authentic creativity.
Inspired Further Development: The WKCT laid the groundwork for modern assessments like the TTCT and has been adapted in many derivative creativity tests globally.
Research Utility: Though not used frequently in K–12 classrooms today, it remains valuable for experimental and cross-cultural creativity research.
📌 Why It Still Matters
The WKCT represents a pivotal moment in creativity assessment, shifting from rigid, IQ-style testing toward more fluid, expressive, and imaginative thinking. Its open-ended structure and rich variety of tasks make it a classic choice for researchers exploring creativity as a multidimensional, cognitive skill.
Alternative Uses Test (AUT): The Quick and Powerful Divergent Thinking Task
The Alternative Uses Test (AUT), introduced by J.P. Guilford in 1967, is one of the most widely used and straightforward tools for measuring creativity, particularly in experimental psychology and cognitive science. Despite its simplicity, the AUT remains a goldmine for insight into how people generate divergent ideas, especially under time constraints.
✅ Acceptance and Popularity
The AUT has become a benchmark test in psychological research on creativity due to its ease of use, minimal materials, and adaptability. It is commonly used in creativity labs, educational research, and studies of neuroscience and cognitive flexibility.
Although less comprehensive than the TTCT or WKCT, the AUT is incredibly efficient, making it ideal for studies involving large sample sizes, fMRI experiments, or short intervention sessions.
🎯 What It Measures
The AUT focuses on divergent thinking, particularly:
Fluency – number of ideas generated.
Originality – statistical uniqueness of responses.
Flexibility – variety in the types of responses.
Elaboration – amount of detail provided for each idea (optional in some versions).
🧪 Structure and Administration
Participants are asked to come up with as many unusual or creative uses as possible for a common object — usually within a 3–5 minute time limit per item. It’s typically administered with 1–3 items in a session.
📝 Example Prompts
“List as many uses as you can for a brick.”
“What can you do with a paperclip?”
“How many different ways could you use a newspaper?”
Responses may range from practical (e.g., "use a newspaper to wrap gifts") to bizarre ("fold it into a hat for a pigeon army"). The point is to break functional fixedness — the cognitive bias that limits objects to their conventional use.
🧮 Scoring System
Scoring is done on multiple dimensions:
Fluency: Total number of appropriate uses.
Originality: Responses are compared against a database or control group to determine rarity.
Flexibility: Number of different conceptual categories (e.g., using a brick for shelter vs. as a weapon).
Elaboration: Level of detail in responses (e.g., “Build a wall” vs. “Stack bricks in a curved pattern to make a low retaining wall”).
Some versions drop elaboration, focusing on the first three dimensions to save time.
❗ Critiques and Limitations
Narrow Scope: The AUT primarily measures verbal idea generation and doesn't capture artistic, emotional, or interpersonal aspects of creativity.
Cultural Influence: What counts as “original” can vary significantly across cultures and contexts.
Scoring Variability: Without normed datasets or trained raters, originality and flexibility scoring can become inconsistent.
Risk of Memorized Responses: Participants familiar with the test might reuse “go-to” creative answers, reducing its validity over time.
🛠️ How It Overcomes Critique
Adaptability: The AUT is often paired with other tests or modified for different research goals (e.g., adding time pressure, altering objects, combining with fMRI).
Standardized Rubrics: Research labs use well-defined scoring rubrics and inter-rater reliability training to improve consistency.
Domain Expansion: Though traditionally verbal, the AUT has inspired visual and multimodal adaptations to tap into broader creative skills.
📌 Why It Still Matters
The Alternative Uses Test endures as a go-to assessment for researchers due to its efficiency, simplicity, and clear link to creative cognition. While not a full-spectrum creativity test, it excels at isolating and analyzing the ideation process — the first and often most vital step in creative thinking.
Urban–Jellen Test of Creative Thinking – Drawing Production (TCT-DP): Visual Creativity Beyond Words
The Urban–Jellen Test of Creative Thinking – Drawing Production (TCT-DP) is a non-verbal creativity assessment developed by Klaus Urban and Hans Jellen in the 1990s. Designed to evaluate creative potential through drawing, this test has become especially useful for children, non-native speakers, and cross-cultural studies, where language might limit traditional creativity assessments.
✅ Status and Acceptance
The TCT-DP is widely used in educational psychology, particularly in Europe and Asia. It is commonly applied in schools to assess creative potential in children aged 5 to 17, as well as in international gifted and talent identification programs. Although less famous than the TTCT or AUT, it is highly regarded for its culture-fair design and broad developmental applicability.
🎯 What It Measures
The TCT-DP focuses on figural creativity and includes 14 scoring criteria rooted in Gestalt psychology, measuring traits such as:
Originality – uniqueness of completed images.
Elaboration – depth of detail added.
Boundary-breaking – crossing pre-existing shapes or page borders.
Perspective and humor – use of unconventional views or playful elements.
Emotional expressiveness and storytelling – suggested through visual composition.
These categories aim to capture creativity beyond verbal fluency or traditional IQ metrics.
🧪 Format and Administration
Participants are given a page with six abstract and incomplete figures (e.g., a half-circle, a squiggle, a dot) and are instructed to complete a drawing using the provided elements. They are encouraged to transform the fragments into a meaningful or imaginative picture and give the drawing a title.
Time: About 15–20 minutes
Materials: Pen, test sheet, optional coloring tools
📝 Example Task
A child receives a sheet with several shapes: a curved line, a small dot, an open triangle.
They might turn these into a drawing of a dragon flying over a hill, integrating the shapes creatively.
The title might be: “The Midnight Dragon Who Eats Stars.”
The scoring rubric would reward original combinations, cross-boundary creativity, and narrative richness.
❗ Critiques and Limitations
Subjective Scoring: Despite structured rubrics, interpretation of visual elements (e.g., emotional expressiveness) can vary between raters.
Artistic Bias Risk: Some worry that the test favors children with strong drawing skills, though it is not meant to assess technical ability.
Limited Verbal Component: Lacks measurement of verbal ideation or creativity in communication-heavy contexts.
🛠️ How It Addresses These Issues
Scoring Training: The TCT-DP includes a detailed manual and scoring system to ensure inter-rater reliability.
Focus on Process, Not Skill: Emphasis is placed on how creatively the individual uses the elements, not on how well they draw.
Language Independence: Its non-verbal design makes it ideal for multilingual classrooms or children with language processing challenges.
📌 Why It Still Matters
The TCT-DP offers a unique, equitable way to assess visual and conceptual creativity, especially in young children or culturally diverse settings. Its structure encourages freedom, imagination, and rule-breaking — key ingredients in truly creative thought.
Barron–Welsh Art Scale (BWAS): Measuring Creative Preference Through Aesthetic Response
The Barron–Welsh Art Scale (BWAS) is a nonverbal assessment tool that measures creative personality traits by evaluating aesthetic preferences. Developed by Frank Barron and George S. Welsh in the 1950s and 60s, it offers a unique approach: instead of asking people to generate ideas, it gauges creativity based on their liking for complex, novel, and asymmetric images.
✅ Status and Acceptance
While not as commonly used in schools, the BWAS is well-established in personality psychology, creative arts research, and clinical settings. It's often used to complement cognitive creativity tests like the TTCT or AUT, helping researchers explore the personality side of creativity.
It has shown consistent correlations with traits such as originality, independence, openness to experience, and nonconformity, making it a valuable diagnostic tool in both research and applied psychology.
🎯 What It Measures
The BWAS doesn’t measure creativity in terms of productivity or idea generation. Instead, it measures creative disposition, particularly:
Preference for complexity and asymmetry
Tolerance for ambiguity
Openness to novel or unconventional stimuli
Rejection of symmetry, balance, or realism (often associated with conformity)
These preferences are strongly linked with creative personalities in fields such as art, design, writing, and scientific innovation.
🧪 Structure and Administration
The BWAS consists of 86 black-and-white abstract line drawings, each ranging from balanced and symmetrical to irregular and chaotic. For each image, the participant simply answers:
“Do you like this picture?” (Yes or No)
Scoring is based on how many unconventional or asymmetric images the participant prefers. High scorers tend to favor unpredictable, nontraditional forms.
📝 Example Items
A balanced image resembling a star or geometric figure (expected to be disliked by high scorers).
A tangled, distorted scribble with no clear pattern (expected to be liked by high scorers).
The BWAS is intentionally non-representational to remove cultural or content-based bias, focusing purely on visual taste.
❗ Critiques and Limitations
Not a performance test: It doesn’t show what someone can do creatively, just what they tend to like.
Context-limited: Works well with adults and adolescents, but may be too abstract for younger children.
Cultural interpretation: While designed to be culture-free, perceptions of what is “novel” or “weird” may still vary across societies.
Art background influence: People with training in art or design may score higher simply due to familiarity, not higher creativity.
🛠️ How It Addresses These Issues
Nonverbal & Minimal Instructions: The binary format makes it accessible across languages, cultures, and education levels.
Use as a Complement: It is typically used in conjunction with other tests (e.g., TTCT or AUT) to provide a fuller picture of creativity, balancing performance and preference.
Supported by Research: Numerous studies have shown high scorers on BWAS often produce more creative work in fields ranging from fine arts to scientific problem-solving.
📌 Why It Still Matters
The Barron–Welsh Art Scale reveals a hidden dimension of creativity—not what people create, but what they’re drawn to. It’s a valuable tool for understanding how personality and perception shape creative potential, especially in adults and cross-disciplinary contexts.
Creativity Assessment via Novelty and Usefulness (CANU): A Modern, Practical Creativity Metric
The CANU model — short for Creativity Assessment via Novelty and Usefulness — is a contemporary creativity assessment framework developed to address many limitations of traditional tests. It evaluates creativity by scoring participants' solutions to problems based on two essential criteria: how novel and how useful they are. Unlike traditional divergent thinking tasks that focus heavily on idea quantity, CANU emphasizes quality and applicability, making it ideal for real-world and workplace settings.
✅ Status and Relevance
While newer and still gaining traction in academic circles, the CANU model has been well-received in innovation research and business psychology, especially for use in online environments. It’s designed to be scalable, objective, and domain-adaptable, making it one of the most promising modern creativity assessments (Prasch et al., 2020).
🎯 What It Measures
CANU is based on the widely accepted definition of creativity as the production of ideas or products that are both:
Novel – surprising, uncommon, or original.
Useful – relevant, functional, or effective in solving a problem.
By directly scoring these two dimensions, CANU avoids over-reliance on fluency and sidesteps the ambiguity of open-ended creativity scoring.
🧪 Format and Administration
Participants complete structured tasks such as:
Designing solutions to hypothetical or real-world problems.
Proposing product ideas, user experiences, marketing strategies, or technical fixes.
Responses are scored along two dimensions:
Novelty score – how unexpected, rare, or original the solution is.
Usefulness score – how practically viable or effective the idea is in context.
This dual-rating system can be manual (expert raters) or automated using AI or crowd-sourced evaluators in digital formats.
📝 Example Task
Prompt: “Design a new type of lunchbox for children aged 6–10 that helps them eat healthier.”
Scoring:
Novelty: “Uses thermochromic color-changing materials to indicate food freshness” → High.
Usefulness: “Includes compartments and a built-in cooler pack” → High.
A creative idea might score high on both, while an impractical but original idea (e.g., a lunchbox that floats) might score high on novelty but low on usefulness.
❗ Critiques and Limitations
Requires Human or AI Scorers: Automated scoring for novelty and usefulness can be subjective or inconsistent without clear guidelines.
Task-specific bias: Performance may vary by domain; someone might score highly on creative writing but not on engineering tasks.
Less focus on process: It evaluates the product of creativity, not the cognitive process behind it.
🛠️ How It Addresses Classic Creativity Test Flaws
Objective Rubric: CANU explicitly defines what it is measuring (novelty and usefulness), avoiding vague constructs like "emotional richness" or "boundary-breaking."
Real-world Relevance: It aligns closely with workplace innovation, design thinking, and entrepreneurial creativity.
Digital Integration: It’s designed for online environments, making it scalable and adaptable to remote testing or classrooms.
📌 Why It Represents the Future
The CANU model reflects a shift from theoretical or lab-based creativity assessment to applied creativity — the kind that drives innovation, design, and problem-solving in the real world. It offers a powerful, modern alternative to traditional divergent thinking tests by focusing on impact, not just ideation.
MONDALK Test: A Modern, Culturally Adapted Tool for Measuring Divergent Thinking
The MONDALK Test is a Hungarian-developed creativity assessment designed to measure divergent thinking across visual, linguistic, and productivity domains. It was built upon and refined from the Wallach–Kogan Creativity Tests (WKCT) and is specifically structured to reduce bias in the evaluation of socially disadvantaged or at-risk students — a critical innovation in the field of giftedness and equity in education.
✅ Status and Use
Though primarily used in Hungary and Central Europe, the MONDALK Test represents a model of modern, culturally sensitive creativity assessment. It is recognized for its high internal consistency, psychometric validity, and role in identifying gifted children who might otherwise be missed by traditional IQ-based methods (Rákóczi & Szitó, 2021).
🎯 What It Measures
The MONDALK Test evaluates divergent thinking through:
Verbal fluency and originality (e.g., alternate uses, associations)
Visual creativity (pattern transformation)
Productivity of ideas (idea generation across multiple formats)
It is also structured to detect:
Domain-specific strengths (e.g., visual over verbal creativity)
Creative potential in marginalized students by using more equitable cutoff criteria
🧪 Format and Administration
The MONDALK Test consists of multiple subtests divided into visual, verbal, and hybrid components. It is offered in A and B versions to avoid familiarity bias during retesting.
Visual tasks: Drawing completions, symbolic transformations
Verbal tasks: Word associations, unusual uses
Scoring: Emphasizes originality, fluency, and quality of originality
The test is standardized on a nationally representative sample of over 1,200 students aged 7–18.
📝 Example Task
“Here are five random shapes. Turn each one into a meaningful image or symbol.”
“Name as many uses as you can for a ‘drumstick’ — not just musical or food-related.”
Scoring focuses on both novelty and domain-specific depth (e.g., verbal richness vs. visual expressiveness).
❗ Critiques and Limitations
Limited international use: Its cultural specificity and Hungarian-language design limit its global scalability.
Training required: Proper administration and scoring require knowledge of its rubric and local adaptation guidelines.
Less known in English-speaking research: It remains underrepresented in global meta-analyses or standardized testing databases.
🛠️ How It Advances Creativity Testing
Equity in gifted identification: By adjusting originality thresholds (e.g., 80th percentile instead of 90th), the MONDALK Test helps reduce test bias and identify hidden creative potential in students from lower-income or marginalized backgrounds.
Standardized and scalable: The test is thoroughly standardized with confirmatory factor analysis supporting its three-factor structure.
📌 Why It Matters
The MONDALK Test demonstrates how creativity assessments can be localized, inclusive, and psychometrically strong — bridging the gap between standardized testing and educational equity. It serves as a model for other regions to develop culturally sensitive tools that go beyond IQ to recognize creative talent in all its forms.