This discussion guide is designed for agency concept testing using AI-moderated or human-moderated interviews. It provides the foundation for testing product concepts, campaign ideas, packaging designs, or service propositions across multiple clients with a single repeatable framework. For the CPG in-house variant structured around stimulus exposure mechanics, a 12-question coding sheet, and shelf-displacement framing, see the CPG concept testing discussion guide template.
The five-phase laddering structure below is the agency spine — five sequential phases each driving toward Level 5+ identity reasoning, optimized for cross-client portability where the same framework adapts across categories. The sequence is designed to protect the signal value of first impressions (Phase 2) by collecting them before any guided exploration begins, and to build depth progressively through relevance, credibility, and comparative evaluation. Rearranging phases — asking about competitive alternatives before assessing personal relevance, for instance — produces systematically biased responses. Run the phases in order.
Guide Structure: 30-Minute Concept Test
Phase 1: Warm-Up and Category Context (0-3 min)
Objective: Establish rapport and understand current category relationship. The warm-up serves two functions: it calibrates the moderator to the respondent’s level of category engagement, and it surfaces the reference frame the respondent brings to the concept before they see it. A respondent who is a daily category user will react to a concept differently than an occasional one — knowing which type you’re talking to before the concept appears changes how you interpret their reaction.
Questions:
- “Tell me about the last time you [bought/used/thought about] something in [category]. What was the situation?”
- “When you think about [category], what brands or products come to mind first?”
Timing note: Three minutes is sufficient. Do not spend more time here — Phase 2 is where the primary data lives, and any time taken from the warm-up comes at Phase 2’s expense.
Phase 2: Unstructured Concept Reaction (3-8 min)
Objective: Capture genuine first impressions before guided exploration. This is the highest-signal phase of the concept test. Respondents’ unprompted reactions — what they notice first, how they describe the concept without prompting, what they find confusing — reveal the real-world cognitive processing that determines whether a concept will succeed or fail in market.
Show the concept, then:
- “Take a moment to look at this. Tell me your honest first reaction — whatever comes to mind.”
- “What stands out to you? Walk me through what you notice.”
- “If a friend asked what this is, how would you describe it?”
Laddering probes:
- “You mentioned [attribute]. What about that caught your attention?” (Level 2)
- “When something has [attribute], what does that mean for you?” (Level 3)
- “Why is that important to you specifically?” (Level 4)
- “What kind of person values that? Do you see yourself that way?” (Level 5+)
AI moderation advantage in Phase 2: In a human-moderated session, the moderator must make real-time judgments about which attributes to probe. An AI moderator running 5-7 levels of laddering probes applies this depth consistently to every response from every participant, eliminating the moderator-variability that typically produces uneven depth across a focus group or IDI series.
Phase 3: Relevance and Need-State (8-15 min)
Objective: Assess fit with real consumer needs and current alternatives. This phase moves from “what do you think of this?” to “where does this fit in your life?” — a shift that separates respondents with genuine purchase potential from those who find the concept interesting but irrelevant to their actual situation.
Questions:
- “Where does this fit — or not fit — into how you currently handle [need]?”
- “If this existed today, would it replace something, add to it, or would you not need it?”
- “Think about the last time you wished something like this existed. What was happening?”
Key probe: “You said you’d replace [current solution] with this. What specifically about [current solution] are you ready to leave behind?” This probe surfaces the unmet expectation that the concept addresses — which is often more actionable for positioning and messaging than the concept’s stated benefits.
Segment analysis note: Relevance and need-state responses vary significantly by usage segment. When running concept tests across multiple audience types (heavy users vs. light users, early adopters vs. mainstream), Phase 3 data should be analyzed by segment before aggregating — the aggregate often conceals strong segment-level signals that change the strategic recommendation.
Phase 4: Credibility and Barriers (15-22 min)
Objective: Identify believability gaps and purchase barriers. This phase is where many concept tests produce their most actionable findings. A concept can be highly appealing (Phase 2) and clearly relevant (Phase 3) but still face credibility barriers that would prevent purchase. Phase 4 finds them before they become market-facing problems.
Questions:
- “What feels realistic about this? What feels like a stretch?”
- “If a brand made this claim, what would you need to see to believe it?”
- “What would stop you from trying this, even if you were interested?”
Common barrier categories: Price barrier (“this sounds more expensive than I’d pay”), brand fit barrier (“I can imagine other brands making this, but not [this brand]”), habit barrier (“I’m so used to [current solution] that switching feels like too much work”), and proof barrier (“I’d want to see reviews/evidence before trying”).
Laddering into barriers: “You mentioned price as a concern. If the price were half of what you expect, would your other concerns change? What would remain?” This probe separates price-dominant barriers from multi-factor barriers, which require different strategic responses.
Phase 5: Comparative Evaluation (22-30 min)
Objective: Position concept against alternatives and extract final assessment. The comparative evaluation phase produces the deliverable most clients use for concept prioritization: a clear ranking of the concept versus alternatives, with articulated reasons that are grounded in the respondent’s personal situation rather than abstract preference.
Questions:
- “Compared to [current solution/competitor], how does this stack up?”
- “If you could change one thing about this concept, what would it be?”
- “If this concept were a person, how would you describe them?”
The projective closing question: The “if this concept were a person” question reliably surfaces brand character and personality dimensions that direct evaluation questions miss. A concept that respondents describe as “reliable but boring” needs different positioning work than one described as “exciting but risky.” These implicit associations often predict competitive positioning more accurately than explicit preference rankings.
How Do You Score Concept Test Results?
Raw interview transcripts do not answer the client’s question: should we develop this concept or not? Four metrics structure the analysis and produce a defensible recommendation.
Spontaneous appeal rate: The percentage of respondents whose Phase 2 first reaction is net positive (describing a benefit or use case) versus net neutral or negative (describing confusion or indifference). A concept with 60%+ spontaneous appeal has real market potential. Below 40%, reconsider the concept or the target audience.
Relevance score: In Phase 3, what percentage of respondents identified a specific personal need the concept addresses and could describe a concrete situation where they would use it? A 70%+ relevance score on a well-screened sample is strong. Below 50% in the target audience typically indicates positioning misalignment rather than product failure — the concept may be solving the right problem with the wrong framing.
Top barrier concentration: Identify the single most-cited barrier in Phase 4. If 60%+ of respondents cite the same barrier, it is a development priority. Diffuse barrier patterns (no single barrier cited by more than 25% of respondents) typically indicate that the concept is credible overall but not personally compelling — a different problem than a concentrated barrier, requiring a different fix.
Comparative preference rate: In Phase 5, what percentage of respondents prefer this concept to their current solution for the use case they described? 40%+ is competitive. 60%+ is strong. Below 30%, the concept needs work before development investment.
What Does a Concept Testing Deliverable Look Like?
A well-structured concept test deliverable for agency clients covers four sections: concept summary (what was tested, who was recruited, how many interviews), appeal and relevance findings (Phases 2-3 data), barrier analysis (Phase 4 data with barrier categories and quotes), and strategic recommendation (go/refine/redirect, with supporting rationale).
The deliverable is typically 10-15 slides for a single-concept test, 15-20 slides for a two- or three-concept battery. It should include at least six to eight verbatim quotes distributed across findings — not as decorative evidence, but as the specific language respondents use, which often feeds directly into messaging and copy development.
The Intelligence Hub on the User Intuition platform indexes all concept test findings by topic, making them searchable for future studies. When a client commissions a second concept test six months later, prior findings surface automatically as context — reducing the exploratory time on the new study by 30-40%.
When Should Agencies Use This Template?
The 30-minute concept test format works for most agency concept testing needs: product concepts, service propositions, campaign ideas, packaging designs, and messaging frameworks. Three situations require adaptation.
Multi-concept batteries: Testing two or three concepts simultaneously. Adaptation: allocate 20 minutes per concept, reduce warm-up to 2 minutes, add a comparative ranking question after all concepts have been evaluated. Total interview time 45-60 minutes.
B2B concept testing: Testing concepts with business buyers rather than consumer audiences. Adaptation: Phase 3 (Relevance) expands to include organizational fit questions (“How would this work within your team’s current process?”) and stakeholder influence questions (“Who else in your organization would need to approve this?”). Total interview time 40-45 minutes.
Early-stage concept testing: Testing a rough concept direction rather than a developed proposition. Adaptation: Phase 2 becomes more generative — asking respondents to build on the concept rather than just evaluate it. Replace “what would stop you from trying this” (Phase 4) with “how would you need this to work to be interested?” This produces concept development input rather than go/no-go data.
For agencies building concept testing into a retainer service, see the agency research retainer pricing models for how to structure recurring concept validation work and the agency competitive analysis discussion guide for pairing concept tests with competitive context studies. For the full agency research overview, see consumer research for agencies and how to build research retainer services for agency clients.
How Does AI Moderation Change Concept Testing Results?
The difference between AI-moderated and human-moderated concept tests is not quality — it is consistency and scale. A skilled human moderator conducts a 30-minute concept test at or above the depth this guide describes. The problem is that no human moderator delivers that quality across 100 simultaneous sessions. At scale, human moderation produces uneven depth: some participants receive thorough laddering probes; others receive rushed follow-ups because the moderator is managing time or rapport differently in each session.
AI moderation applies the same laddering depth to every response from every participant. When 100 respondents complete the Phase 2 unstructured reaction, all 100 receive equivalent probing depth at Levels 2 through 5. The comparative analysis that follows — what percentage of respondents reached identity-level reasoning (Level 5) about this concept? — is clean because the methodology did not vary.
The second difference is speed. A traditional 15-20 person focus group series for concept testing — recruit, schedule, moderate, transcribe, analyze — runs 4-8 weeks. AI moderation fields 100 concept test interviews simultaneously and closes them in days. For agencies working on client timelines tied to product launches, campaign approvals, or innovation pipeline reviews, that speed is not a minor convenience. It changes what concept testing can be used for — weekly validation of evolving concepts, rapid iteration between rounds, concept testing tied to specific weekly planning cycles rather than quarterly research reviews.
Participant satisfaction across the platform runs at 98%, which matters for concept testing specifically. Concept tests ask participants to engage with an unfamiliar idea and reason about it carefully for 30 minutes. Low-quality interview experiences — confusing interfaces, technical friction, poor question sequencing — produce shorter, less reflective answers. When 98% of participants report a satisfying experience, the interview quality floor stays high across the full sample, not just the cooperative respondents.
Fielding this template with User Intuition
The five-phase template is built to drive every participant toward Level 5+ identity reasoning, and that depth is only reliable if the laddering probes fire consistently across the full sample. User Intuition’s AI moderator applies the Phase 2 ladder — “what about that caught your attention?” through “what kind of person values that?” — at equivalent depth to every respondent, so the comparative scoring that follows is clean: when an agency reports what percentage of respondents reached identity-level reasoning about a concept, that figure is not distorted by which sessions a tired human moderator probed thoroughly and which got rushed.
For agencies, the differentiating capability is that the template moves from a quarterly research exercise to a weekly validation tool. A 30-interview concept test fields and closes inside 24 hours, drawing participants from a 4M+ panel screened to the concept’s target audience, which means concepts can be tested between iteration rounds and tied to specific weekly planning cycles rather than waiting for a 4-8 week focus-group series. The Intelligence Hub indexes every test so a client’s second concept test six months later surfaces the first as context, cutting exploratory time. For the methodology this template supports, see the concept testing solution overview, or book a demo to run this discussion guide on a live concept.
Concept Test Question Bank
Additional questions for each phase, for adaptation based on concept type:
Phase 2 additions (for packaging or visual concepts):
- “If you saw this on a shelf next to [competing products], what would you think it was for?”
- “What does the visual design tell you about the quality level?”
Phase 3 additions (for service concepts):
- “Walk me through how you’d use this in a typical week. What would trigger you to reach for it?”
- “Is there a type of person who would use this more than you would? Describe them.”
Phase 4 additions (for premium-priced concepts):
- “At what price point does this go from ‘worth trying’ to ‘worth sticking with’?”
- “What would make you confident enough in this to recommend it to someone else before you’ve tried it yourself?”
Phase 5 additions (for concept battery comparisons):
- “Of the concepts you’ve seen today, which one addresses the most important problem in your life right now?”
- “Which concept would you be most likely to describe to a friend this week? What would you say?”
These additions are meant to be selected, not appended wholesale. Each concept test should draw from the core five-phase structure and add 2-4 supplemental questions matched to the specific concept type and client objective. Expanding the guide beyond 30 minutes requires reducing depth somewhere else — a 35-minute guide should have a shorter Phase 1 or a more focused Phase 5, not more questions at the same depth.