← Reference Deep-Dives Reference Deep-Dive · 6 min read

How to Design a Concept Testing Discussion Guide

By Kevin, Founder & CEO

What a Discussion Guide Is (and Is Not)


A discussion guide is the research instrument that determines what you learn from every concept test interview. It is not a script to be read verbatim. It is a structured framework that specifies what ground to cover, what questions to ask, and how deeply to probe—while allowing natural conversational flow.

The guide serves three functions:

  1. Coverage guarantee: Every participant gets asked about the same core topics
  2. Probing instruction: The moderator (human or AI) knows when and how to dig deeper
  3. Quality control: Consistent structure enables valid comparison across participants

Poor discussion guides produce poor data regardless of sample quality, moderator skill, or analytical sophistication. The guide is upstream of everything.

The Six-Section Structure


Every concept testing discussion guide should follow this sequence. The order matters—each section builds on the previous one.

Section 1: Rapport and Context (3-5 minutes)

Establish comfort and gather category context before showing any concept. This section serves two purposes: putting the participant at ease, and understanding their current category relationship so you can interpret concept reactions against their baseline.

Key questions:

  • “Tell me about the last time you [relevant category behavior].”
  • “What matters most to you when choosing [category product]?”
  • “What frustrates you about your current options?”

Do not mention the concept, the brand, or the specific product category framing yet. You want unprimed category context.

Section 2: Stimulus Presentation (1-2 minutes)

Present the concept with clear, neutral instructions. How you present the stimulus shapes every reaction that follows.

Guidelines:

  • Give participants enough time to absorb the concept (at least 30 seconds for visual concepts, full read-through for written ones)
  • Use neutral framing: “I am going to show you an idea for a product/service. There are no right or wrong reactions.”
  • Do not explain or contextualize the concept beyond what the stimulus itself contains—if you need to explain it, the concept is not ready for testing
  • In AI-moderated interviews, the stimulus is presented digitally with consistent formatting and timing

Section 3: First Reaction—Unaided (3-5 minutes)

Capture the unstructured, top-of-mind reaction before any directed questioning. This is the most fragile and most valuable part of the interview.

The opening question should be broad and non-directive:

  • “What are your first thoughts?”
  • “What stands out to you?”
  • “Talk me through your initial reaction.”

Then probe the first reaction with one level of depth:

  • “You mentioned [X]. Tell me more about that.”
  • “You seemed to focus on [element]. What drew your attention there?”

Do not ask about specific concept elements yet. Do not ask whether they like it. The unaided reaction reveals what is most salient—which is often not what the concept creator expected.

Section 4: Structured Probing (12-18 minutes)

This is the core of the interview. Probe five dimensions systematically:

DimensionWhat It MeasuresOpening Probe
AppealEmotional and rational attraction”What, if anything, appeals to you about this?”
ClarityWhether the concept communicates clearly”In your own words, what is this offering?”
RelevancePersonal applicability”How well does this fit with your life/needs right now?”
UniquenessDifferentiation from existing options”How is this different from what is already available to you?”
IntentBehavioral likelihood”How likely would you be to try this? Walk me through your thinking.”

Each dimension gets its own probing sequence. Do not rush through all five to check boxes—depth on three dimensions is more valuable than surface coverage of five.

Section 5: Comparative Evaluation (5-8 minutes)

If testing multiple concepts, this section manages the comparison. If testing a single concept, this section compares against the participant’s current solution.

For multi-concept comparison:

  • Present the second concept and repeat Section 3 (first reaction) in abbreviated form
  • Then ask directly: “Thinking about both concepts, which one would you be more likely to try? Walk me through that choice.”
  • Probe the trade-offs: “What does [Concept A] offer that [Concept B] does not, and vice versa?”

For single-concept comparison to status quo:

  • “Compared to how you currently handle [category need], how does this stack up?”
  • “What would this need to offer for you to switch from what you use now?”

Section 6: Improvement and Close (3-5 minutes)

End with forward-looking improvement questions and a final assessment:

  • “If you could change one thing about this concept, what would it be?”
  • “What is missing that would make this a must-have for you?”
  • “On reflection, how would you summarize your overall feeling about this idea?”

The closing summary question often produces the most quotable and analytically useful response of the entire interview, because the participant has had 30 minutes to process their thoughts.

Writing Non-Leading Questions


Leading questions are the most common source of contaminated concept test data. A question is leading when it contains the answer, implies a desired response, or uses loaded language.

Leading (Avoid)Non-Leading (Use)
“Don’t you think this is innovative?""How would you describe this compared to what exists?"
"Would you agree that this solves the problem of…?""What problem, if any, does this address for you?"
"How much do you love this feature?""What is your reaction to this feature?"
"This is designed to save you time. How important is that?""What benefits, if any, do you see in this?”

The principle: let the participant supply the evaluative language. Your questions provide the topic; their answers provide the judgment.

Laddering: Going 5-7 Levels Deep


Laddering is the technique that separates concept testing depth interviews from concept testing surveys. The goal is to move from surface reaction to underlying motivation through progressive “why” probing.

A laddering sequence on appeal:

  1. Surface: “What appeals to you?” — “The convenience.”
  2. Define: “When you say convenience, what specifically do you mean?” — “I wouldn’t have to go to the store.”
  3. Context: “How often does going to the store create a problem for you?” — “Every week. It takes my whole Saturday morning.”
  4. Value: “What would you do with that Saturday morning time?” — “Spend it with my kids.”
  5. Emotional core: “How does that trade-off—store time versus kid time—make you feel about your current routine?” — “Guilty, honestly.”
  6. Decision weight: “How much would resolving that guilt factor into choosing a product like this?” — “It would be the main reason.”
  7. Behavioral threshold: “What would this product need to deliver for you to actually make the switch?” — “Reliable delivery by Friday evening.”

Seven levels took us from “it is convenient” to “reliable Friday delivery resolves parental guilt about lost weekend time.” The first answer is useless for product decisions. The seventh is actionable.

AI moderation is particularly strong at laddering because it follows the probing sequence consistently without social fatigue. Human moderators often stop at level 3-4 because continued probing can feel awkward. AI moderators, operating on User Intuition’s platform, execute the full 5-7 level sequence across every participant and every topic.

Adapting the Guide for Different Concept Types


The six-section structure holds, but probing emphasis shifts:

  • Product concepts: Emphasize functional benefit probing and usage context
  • Packaging concepts: Emphasize visual hierarchy, shelf standout, and information sufficiency
  • Service concepts: Emphasize process understanding, trust signals, and perceived effort
  • Messaging concepts: Emphasize comprehension, believability, and emotional resonance
  • Pricing concepts: Emphasize value perception, reference pricing, and willingness-to-pay (see the pricing concept testing guide)

Common Discussion Guide Mistakes


Starting with the concept. Skipping rapport and category context means you cannot interpret reactions. A participant who says “this is fine” means something very different if they are satisfied with current options versus actively frustrated.

Asking closed questions too early. “Do you like this?” produces a yes/no that shuts down exploration. Always start with open probes and save structured rating questions for the end.

Over-loading the guide. Trying to cover 10 topics in 30 minutes means you cover nothing deeply. Prioritize 3-4 dimensions for deep probing and accept surface-level coverage on the rest.

Writing the guide for stakeholders, not participants. Internal jargon, technical feature names, and business-speak do not belong in participant-facing questions. Write in the language your participants use.

Neglecting transition language. Abrupt topic shifts feel like an interrogation. Include brief transitions: “Now I would like to shift and talk about…” or “That is really helpful. Let me ask about a different aspect.”

For question-level guidance, the concept testing questions reference provides a bank of tested probes organized by dimension. The concept testing template offers a ready-to-customize discussion guide framework.

Frequently Asked Questions

The six-section structure is: rapport building (establishing comfort and context before introducing the concept), stimulus introduction (presenting the concept without leading commentary), first reaction (capturing unprompted initial response), structured probing (exploring specific dimensions in depth), comparison (evaluating the concept relative to alternatives if applicable), and improvement (asking what would need to change to increase appeal). The sequence is designed to capture genuine first reactions before analytical probing contaminates the initial response.
Non-leading questions avoid implying a preferred answer, signaling the researcher's hypothesis, or using evaluative language that frames how the participant should feel. 'What comes to mind when you look at this?' is non-leading; 'What do you think about this interesting new approach?' is leading. The test for a non-leading question is whether it could produce a negative, positive, or neutral response with equal ease—if only positive responses are grammatically natural, the question is leading.
Laddering is a probing sequence that repeatedly asks 'why does that matter to you?' or 'what would that mean for you?' after each answer, progressively moving from surface attribute preferences to underlying values. Research on effective laddering indicates that 5-7 levels of probing are typically needed to reach stable, underlying motivations—the actual decision drivers that explain why consumers prefer or reject a concept. Stopping at level 2 or 3 leaves the most strategically useful insight unreached.
User Intuition's AI moderator executes the six-section structure with consistent neutrality across all participants—applying non-leading question framing without the variability that human moderators introduce. The adaptive probing engine pursues laddering sequences in real time, following each participant's response thread to 5-7 levels rather than advancing to the next scripted question after a single follow-up. This produces consistent discussion guide execution at scale.
Get Started

Put This Research Into Action

Run your first 3 AI-moderated customer interviews free — no credit card, no sales call.

Self-serve

3 interviews free. No credit card required.

See it First

Explore a real study output — no sales call needed.

No contract · No retainers · Results in 72 hours