product-discovery
Use this skill when applying Jobs-to-be-Done, building opportunity solution trees, mapping assumptions, or validating product ideas. Triggers on product discovery, JTBD, jobs-to-be-done, opportunity solution trees, assumption mapping, experiment design, prototype testing, and any task requiring product discovery methodology.
product product-discoveryjtbdopportunity-treesassumptionsvalidationWhat is product-discovery?
Use this skill when applying Jobs-to-be-Done, building opportunity solution trees, mapping assumptions, or validating product ideas. Triggers on product discovery, JTBD, jobs-to-be-done, opportunity solution trees, assumption mapping, experiment design, prototype testing, and any task requiring product discovery methodology.
product-discovery
product-discovery is a production-ready AI agent skill for claude-code, gemini-cli, openai-codex. Applying Jobs-to-be-Done, building opportunity solution trees, mapping assumptions, or validating product ideas.
Quick Facts
| Field | Value |
|---|---|
| Category | product |
| Version | 0.1.0 |
| Platforms | claude-code, gemini-cli, openai-codex |
| License | MIT |
How to Install
- Make sure you have Node.js installed on your machine.
- Run the following command in your terminal:
npx skills add AbsolutelySkilled/AbsolutelySkilled --skill product-discovery- The product-discovery skill is now available in your AI coding agent (Claude Code, Gemini CLI, OpenAI Codex, etc.).
Overview
Product discovery is the ongoing practice of learning what to build before - and while - building it. The goal is to reduce risk: shipping the wrong thing is far more expensive than the research that would have prevented it. This skill covers Jobs-to-be-Done (JTBD), opportunity solution trees, assumption mapping, experiment design, and prototype testing - giving an agent the judgment to run rigorous discovery the way a senior product manager or product trio would.
Tags
product-discovery jtbd opportunity-trees assumptions validation
Platforms
- claude-code
- gemini-cli
- openai-codex
Related Skills
Pair product-discovery with these complementary skills:
Frequently Asked Questions
What is product-discovery?
Use this skill when applying Jobs-to-be-Done, building opportunity solution trees, mapping assumptions, or validating product ideas. Triggers on product discovery, JTBD, jobs-to-be-done, opportunity solution trees, assumption mapping, experiment design, prototype testing, and any task requiring product discovery methodology.
How do I install product-discovery?
Run npx skills add AbsolutelySkilled/AbsolutelySkilled --skill product-discovery in your terminal. The skill will be immediately available in your AI coding agent.
What AI agents support product-discovery?
This skill works with claude-code, gemini-cli, openai-codex. Install it once and use it across any supported AI coding agent.
Maintainers
Generated from AbsolutelySkilled
SKILL.md
Product Discovery
Product discovery is the ongoing practice of learning what to build before - and while - building it. The goal is to reduce risk: shipping the wrong thing is far more expensive than the research that would have prevented it. This skill covers Jobs-to-be-Done (JTBD), opportunity solution trees, assumption mapping, experiment design, and prototype testing - giving an agent the judgment to run rigorous discovery the way a senior product manager or product trio would.
When to use this skill
Trigger this skill when the user:
- Asks how to apply Jobs-to-be-Done or conduct JTBD interviews
- Wants to build or review an opportunity solution tree
- Needs to map, categorize, or prioritize assumptions
- Is designing an experiment, A/B test, or validation study
- Wants to run or evaluate prototype tests (concept, usability, or value)
- Asks how to synthesize qualitative or quantitative discovery data
- Needs to establish a discovery cadence or dual-track workflow
- Is deciding between multiple product bets or solution directions
Do NOT trigger this skill for:
- Pure delivery execution (sprint planning, story writing, velocity - use agile-scrum)
- Growth hacking or marketing experimentation (use a growth or marketing skill)
Key principles
Discover continuously, not in phases - Discovery is not a gate before delivery. It runs in parallel with shipping. Every sprint produces both validated learning and working software. "Done with discovery" is a warning sign.
Outcomes over outputs - The goal is a measurable change in customer behavior, not a feature shipped. Define success as a behavioral outcome first; the solution is just a hypothesis about how to reach it.
Test assumptions, not ideas - Every solution idea rests on a stack of assumptions. Surface the riskiest ones first and test those - not the idea in its entirety. This collapses validation time by 10x.
Smallest experiment possible - Always ask: "What is the cheapest, fastest way to learn whether this assumption is true?" A 5-minute interview, a smoke test, or a paper prototype can invalidate months of engineering work.
Dual-track: discovery and delivery in parallel - One track discovers the next problem worth solving; the other delivers on already-validated solutions. Teams that separate these into sequential phases go dark on learning for months at a time.
Core concepts
JTBD Framework
Jobs-to-be-Done treats customer behavior as hiring a product to do a job. The canonical JTBD statement is:
"When [situation], I want to [motivation], so I can [expected outcome]."
Jobs have three layers:
- Functional job - The practical task (file my taxes quickly)
- Emotional job - How the customer wants to feel (confident I won't get audited)
- Social job - How they want to be perceived (look responsible to my partner)
Strong solutions address all three layers. Most competitors only address the functional job, leaving emotional and social value uncaptured.
Interview for jobs by asking about the last time the customer did the relevant behavior - not hypotheticals. "Tell me about the last time you..." surfaces actual pull, struggle, and workaround data.
Opportunity Solution Trees
The opportunity solution tree (OST) - developed by Teresa Torres - is a visual tool that maps the path from a desired outcome to the experiments that test candidate solutions.
Desired Outcome
+-- Opportunity 1 (unmet need / pain / desire)
| +-- Solution A
| | +-- Assumption 1 --> Experiment
| | +-- Assumption 2 --> Experiment
| +-- Solution B
| +-- Assumption 3 --> Experiment
+-- Opportunity 2
+-- ...Key rules:
- The root is always an outcome (metric), never a solution
- Opportunities are discovered from customers - not invented in the office
- Each solution sits below a single opportunity - never jump to solution without an opportunity
- Every solution has at least one assumption being actively tested
Assumption Types
Every product bet rests on four categories of assumptions:
| Type | Question it answers | Example |
|---|---|---|
| Desirability | Do customers want this? | "Users want to share playlists with non-subscribers" |
| Viability | Can we make money from it? | "Enterprise customers will pay $50/seat for SSO" |
| Feasibility | Can we build it? | "We can infer intent from existing event data" |
| Usability | Can customers use it without friction? | "Users can complete onboarding without a tooltip" |
Prioritize assumptions by: risk x proximity to a decision. Test the assumption that, if wrong, would kill the bet - before testing assumptions about optimization.
Experiment Hierarchy
From lowest to highest fidelity and cost:
- Assumption audit - List and stack-rank assumptions; no customer contact yet
- Secondary research - Existing data, competitor analysis, academic studies
- Customer interview - 30-60 min; 5-8 participants for a theme to emerge
- Survey - Quantifies frequency of a qualitatively discovered pattern
- Smoke test / landing page - Measures real intent without building the feature
- Wizard of Oz - Manual fulfillment behind a product interface
- Prototype test - Simulates the experience at chosen fidelity (paper, lo-fi, hi-fi)
- Concierge MVP - Deliver the value manually; learn the job deeply
- Technical spike - Validate feasibility assumption with a time-boxed build
- A/B test / live experiment - Measures actual behavior change in production
See references/experiment-playbook.md for templates by assumption type.
Common tasks
Conduct JTBD interviews
Framework (45-60 min):
Recruitment - Screen for people who have recently done the behavior you're studying. Recent = within 90 days. Avoid future-intent screening questions.
Timeline reconstruction (20 min) - "Walk me through everything that happened from the moment you first realized you needed [solution category] to the moment you made a decision." Map: first thought, passive looking, active looking, deciding.
Dig into the struggle (15 min) - "What had you tried before? What was unsatisfying? What almost made you not switch?"
Outcomes and anxieties (10 min) - "What were you hoping would be different? What were you worried might not work?"
Wrap (5 min) - "If you could change one thing about [product], what would it be?" Use sparingly - this is ideation, not discovery.
Output: Job stories, struggle patterns, and switch triggers. Theme across 5+ interviews before drawing conclusions.
Build an opportunity solution tree
Start with the outcome - Name the metric the product trio owns this quarter, e.g., "Increase week-2 retention from 42% to 55%."
Generate opportunities from interview data - Each opportunity is an unmet need, pain, or desire expressed by a real customer. Do not invent opportunities in workshops.
Cluster and name - Group related struggles. Name them as customer problems ("I lose context when switching devices"), not solutions ("add cross-device sync").
Select the focus opportunity - Use impact/confidence/ease to compare. Pick one.
Brainstorm solutions - Generate 3+ candidate solutions per opportunity. Quantity over quality at this stage. Include unconventional ideas.
Map assumptions per solution - For each candidate, list what must be true for it to work. Sort by type (desirability/viability/feasibility/usability).
Design one experiment per risky assumption - Smallest test that could change your mind. Assign owner and timeline.
Map and prioritize assumptions
Use a 2x2 matrix: Certainty (known vs. unknown) x Risk (low vs. high).
- High risk, low certainty - Test immediately. These are bet-killers.
- High risk, high certainty - Monitor. You believe these but should revisit if evidence shifts.
- Low risk, low certainty - Research when convenient. Won't kill the bet.
- Low risk, high certainty - Ignore for now.
For each risky assumption, write a falsifiable statement: "We believe X. We will know this is true when we see Y. We will know it is false when we see Z."
Design validation experiments
Match the experiment type to the assumption category:
| Assumption type | Preferred experiment | Signal to look for |
|---|---|---|
| Desirability | Customer interview, smoke test | Pull signals + click-through rate |
| Viability | Pricing interview, willingness-to-pay study | 20%+ "definitely would pay" at target price |
| Feasibility | Technical spike, data audit | Can be built within X sprints |
| Usability | Usability test (think-aloud) | Task completion rate, errors, time-on-task |
Every experiment needs: hypothesis, method, sample size, success criterion, and a kill threshold - the result that would lead you to abandon the bet.
See references/experiment-playbook.md for detailed templates.
Run prototype tests
Choose fidelity based on what you're testing:
| Fidelity | Best for | Tools |
|---|---|---|
| Paper / sketch | Flow and information architecture | Pen, Balsamiq |
| Lo-fi wireframe | Navigation and content hierarchy | Figma (no styling) |
| Hi-fi mockup | Visual design and emotional response | Figma, Framer |
| Coded prototype | Interaction quality, performance perception | Storybook, CodeSandbox |
| Production feature | Behavior change, retention, conversion | Feature flag in prod |
Think-aloud protocol: Brief the participant ("we're testing the design, not you"), ask them to narrate thoughts as they navigate, do not hint or help, note confusion and errors, debrief after each task. Five participants reveal ~85% of usability issues.
Synthesize discovery insights
Structure synthesis as: observation - pattern - insight - implication.
- Observation - What one customer said or did (raw data)
- Pattern - What appeared across multiple customers (theme)
- Insight - Why this pattern exists (interpretation)
- Implication - What it means for the product (decision input)
Avoid jumping from observation to implication. The missing middle is where discovery adds value over anecdote.
Affinity mapping: Write each observation on its own sticky. Group silently. Name groups as customer problems, not solutions. Rank by frequency and intensity of pain.
Create a discovery cadence for the team
A sustainable cadence for a three-person product trio (PM, designer, engineer):
| Cadence | Activity | Time |
|---|---|---|
| Weekly | 2-3 customer interviews or usability sessions | 2-3 hrs |
| Weekly | Assumption review: what did we learn, what changed? | 30 min |
| Bi-weekly | OST review: update tree with new opportunities and learnings | 1 hr |
| Monthly | Opportunity prioritization: re-rank based on new evidence | 1 hr |
| Quarterly | Outcome review: did we move the metric? What next? | 2 hrs |
Talking to 2-3 customers per week compounding over a year creates an insurmountable understanding advantage over teams that research in batches.
Anti-patterns
| Anti-pattern | Why it's harmful | What to do instead |
|---|---|---|
| Big-bang discovery | 6-week research phase before a project; team goes dark on learning during delivery | Embed 2-3 interviews per week alongside shipping; discovery never stops |
| Solution-first OST | Listing features at the root of the tree instead of an outcome | Always start with a measurable outcome metric; solutions are hypotheses |
| Validation theater | Running research to confirm a decision already made; cherry-picking supporting quotes | Write a kill threshold before the study: the result that would change your mind |
| Over-fitting to one customer | Pivoting strategy based on feedback from a single vocal customer | Require a pattern across 5+ independent sources before changing direction |
| Premature high-fidelity | Pixel-perfect prototypes before validating the core job | Match fidelity to the assumption; paper prototypes can kill 80% of bad ideas cheaply |
| Skipping feasibility | Testing only desirability; engineering discovers a blocker in sprint 3 | Include an engineer in discovery; run a technical spike for any novel feasibility assumption |
Gotchas
Recruiting interviewees through your own app produces selection bias - Users who respond to an in-app recruitment banner are your most engaged advocates. They will tell you the product is great and suggest incremental improvements. To discover why users churn or never activate, you must recruit from people who did not engage - churned users, trial non-converters, and target-persona non-users. Use external recruitment panels for discovery that needs unbiased signal.
Opportunity solution trees built in workshops produce solutions disguised as opportunities - When teams generate the OST collaboratively in a room, "opportunities" are often features rephrased as problems ("users want a better export experience" is a solution frame, not an opportunity). Real opportunities come from verbatim customer language captured in interviews, not from workshop sticky notes. Build the OST from interview data, not from team hypotheses.
Smoke tests measure intent to click, not willingness to pay or actual use - A high click-through rate on a "coming soon" landing page is a desirability signal, not a conversion signal. Users who click are curious; they have not committed to changing behavior, paying, or integrating the feature into their workflow. Smoke tests invalidate "no one wants this" but do not validate "people will pay and retain."
Using a high-fidelity prototype for flow testing anchors users on visual design - When a prototype looks production-ready, participants comment on button colors and copy instead of navigating authentically and revealing flow problems. For testing information architecture and navigation, deliberately use lo-fi wireframes. Reserve hi-fi prototypes for testing emotional response and design quality.
Kill thresholds defined after the experiment results are in are rationalization, not rigor - If you decide what "failure looks like" after you see the data, you will unconsciously set the threshold to preserve your preferred conclusion. Write the kill threshold - the specific metric result that would cause you to abandon or pivot the bet - in the experiment design document before the study begins.
References
references/experiment-playbook.md- Experiment templates by assumption type with success criteria, sample sizes, and analysis guidance
References
assumption-testing.md
Assumption Testing
This reference covers the full catalog of discovery experiment types, organized by which assumption category they test. Use it to pick the fastest, cheapest experiment for the riskiest assumption on your opportunity solution tree.
The assumption matrix
Before choosing an experiment, classify assumptions on two dimensions:
HIGH IMPORTANCE
(if wrong, solution fails)
|
Leap-of-faith | Known risk
TEST THESE FIRST | Test these second
|
LOW EVIDENCE ----------+---------- STRONG EVIDENCE
|
Irrelevant | Table stakes
Deprioritize | Monitor, don't test
|
LOW IMPORTANCE
(if wrong, solution still works)Focus your experiment budget on the upper-left quadrant: high importance, low evidence. These are leap-of-faith assumptions.
Experiment catalog by assumption type
Desirability experiments
Test whether customers actually want the solution.
| Experiment | How it works | Duration | Sample size | Signal strength |
|---|---|---|---|---|
| Fake door test | Add a button/link for the feature that leads to a "coming soon" page. Measure click rate. | 3-7 days | 500+ impressions | Medium - shows interest, not commitment |
| Landing page test | Create a landing page describing the value prop. Measure sign-up or waitlist conversion. | 5-14 days | 1000+ visitors | Medium-high - real behavior on perceived real offering |
| Concierge test | Manually deliver the solution to 5-10 users. Observe engagement and willingness to continue. | 1-2 weeks | 5-10 users | High - real usage, real feedback |
| Wizard of Oz | Users interact with what looks like a working product, but a human performs the backend work. | 1-2 weeks | 10-20 users | High - tests full experience without building |
| Pre-order / LOI | Ask customers to pre-order or sign a letter of intent. Money or commitment on the table. | 1-2 weeks | 10-50 prospects | Very high - willingness to pay is the strongest signal |
| One-question survey | Ask existing users: "How disappointed would you be if you could no longer use [feature]?" | 2-3 days | 100+ responses | Medium - Sean Ellis test for product-market fit |
Viability experiments
Test whether the business model works.
| Experiment | How it works | Duration | Sample size | Signal strength |
|---|---|---|---|---|
| Pricing page test | Show different price points to different cohorts and measure conversion. | 1-2 weeks | 500+ per variant | High - real purchasing behavior |
| Willingness-to-pay survey | Van Westendorp price sensitivity meter: ask 4 pricing questions to find acceptable range. | 3-5 days | 50-100 respondents | Medium - stated preference, not revealed |
| Unit economics model | Build a spreadsheet: CAC, LTV, margin, payback period. Stress-test with pessimistic inputs. | 1 day | N/A (analytical) | Low-medium - depends on input quality |
| Partner/vendor feasibility call | Call potential partners or vendors to confirm terms, pricing, and integration requirements. | 1-3 days | 3-5 calls | Medium - verbal commitments are soft |
Feasibility experiments
Test whether the team can build it.
| Experiment | How it works | Duration | Sample size | Signal strength |
|---|---|---|---|---|
| Technical spike | Engineer builds the riskiest technical component in isolation. Time-boxed. | 1-3 days | N/A | High - proves or disproves the technical hypothesis |
| API prototype | Build a minimal API that handles the core data flow. No UI. | 2-5 days | N/A | High - surfaces integration issues early |
| Third-party evaluation | Evaluate vendor APIs, libraries, or services for the key capability. | 1-2 days | 3-5 vendors | Medium-high - reveals real constraints |
| Data audit | Check whether the data needed for the solution exists, is accessible, and is clean enough. | 1 day | N/A | High - data problems kill more features than code problems |
Usability experiments
Test whether users can figure out the solution.
| Experiment | How it works | Duration | Sample size | Signal strength |
|---|---|---|---|---|
| Paper prototype test | Sketch the key screens on paper or whiteboard. Walk 5 users through the flow. | 1 day | 5 users | Medium - tests concept and flow, not visual design |
| Clickable prototype test | Build a Figma/Framer prototype. Run moderated usability tests with think-aloud protocol. | 3-5 days | 5-8 users | High - realistic interaction patterns |
| First-click test | Show users a screen and ask: "Where would you click to [accomplish task]?" Measure accuracy. | 1-2 days | 20-50 users | Medium - fast signal on navigation and layout |
| Five-second test | Show a screen for 5 seconds. Ask what the user remembers and what they think it does. | 1 day | 20-30 users | Low-medium - tests clarity of value proposition |
| Unmoderated task test | Record users completing specific tasks in the prototype without a facilitator. | 3-5 days | 10-20 users | Medium-high - natural behavior, larger sample |
Designing experiments that can fail
The most common failure mode in discovery is designing experiments that confirm the idea regardless of the result. Every experiment must have:
1. A falsifiable hypothesis
We believe that [specific user segment]
will [specific measurable behavior]
when [specific condition/change is introduced]
because [rationale from customer evidence].2. Pre-committed success criteria
Define before data collection:
- Success threshold: "If >X% of users do Y, we proceed"
- Kill threshold: "If <Z% of users do Y, we stop"
- Ambiguous zone: "If between Z% and X%, we need more data"
3. Decision rules
| Result | Action |
|---|---|
| Above success threshold | Move solution to delivery backlog; test next riskiest assumption |
| In ambiguous zone | Design a more targeted experiment; increase sample size |
| Below kill threshold | Archive the solution; return to the opportunity and try a different solution |
Experiment sequencing
Not all assumptions need testing. Use this decision tree:
Is this assumption high-importance?
|
NO --> Skip (even if wrong, solution still works)
YES --> Do we have strong evidence already?
|
YES --> Monitor, don't test
NO --> Is it a desirability assumption?
|
YES --> Test FIRST (if nobody wants it, nothing else matters)
NO --> Test SECOND (viability, feasibility, usability)General sequencing rule: Desirability > Viability > Usability > Feasibility
Rationale: Desirability failures are the most expensive to discover late. Feasibility failures are the cheapest - engineers usually know within days whether something is buildable. Test in order of "cost of being wrong late."
Worked example
Context: A budgeting app team discovers that users struggle to track subscription charges across multiple payment methods.
Solution idea: Auto-detect and consolidate all recurring charges into a single "Subscriptions" view.
Assumptions identified:
| # | Assumption | Category | Importance | Evidence |
|---|---|---|---|---|
| A1 | Users want to see all subscriptions in one place | Desirability | High | Weak (3/6 interviews mentioned it) |
| A2 | We can reliably detect recurring charges from transaction data | Feasibility | High | Weak (untested algorithm) |
| A3 | Users will check the subscription view at least monthly | Desirability | Medium | None |
| A4 | Consolidation reduces churn from the budgeting app | Viability | Medium | None |
| A5 | Users can understand the categorization without explanation | Usability | Medium | None |
Testing order:
- A1 (desirability, high importance, weak evidence) - Fake door test
- A2 (feasibility, high importance, weak evidence) - Technical spike
- A5 (usability, medium importance, no evidence) - Paper prototype test
- A3 and A4 can be measured post-launch with analytics
Experiment for A1:
EXPERIMENT BRIEF
================
Assumption: Users want to see all subscriptions in one place (A1)
Experiment type: Fake door test
Setup: Add a "Subscriptions" tab to the main navigation for 50% of
users. Clicking it shows a "Coming soon - join the waitlist" modal.
Participants: 50% of active users (A/B split), ~2000 users per arm
Success metric: Waitlist sign-up rate among those who see the tab
Kill criteria: If <3% of users who see the tab click it within 7 days,
the demand signal is too weak to pursue.
Success criteria: If >8% of users click and >40% of clickers join the
waitlist, proceed to feasibility spike (A2).
Timeline: 7 days
Decision:
If success: Run technical spike for A2
If kill: Return to opportunity; explore other solution ideas experiment-playbook.md
Experiment Playbook
Every product experiment should be matched to the type of assumption being tested. Using a high-fidelity experiment for a low-certainty desirability assumption wastes weeks. Using a quick interview for a feasibility question produces noise. This playbook maps assumption categories to the right experiment type with templates, sample sizes, and success criteria.
Universal Experiment Brief Template
Fill this out before running any experiment. Define the kill threshold before collecting data - post-hoc rationalization is the enemy of real learning.
EXPERIMENT BRIEF
================
Assumption:
[One specific, falsifiable belief you are testing]
Assumption type:
[ ] Desirability [ ] Viability [ ] Feasibility [ ] Usability
Experiment type:
[Customer interview / smoke test / survey / prototype test /
technical spike / wizard of oz / concierge / A/B test]
Setup:
[What to build or prepare - keep it minimal]
Participants:
Who: [Customer segment]
How many: [Minimum to reach signal]
Recruited via: [Source]
Success metric:
[One quantitative signal - task completion rate, click-through, etc.]
Success threshold:
[e.g., >= 30% click-through, >= 4/5 task completions]
Kill threshold:
[e.g., < 15% click-through, < 2/5 task completions]
Timeline:
Start: [Date] End: [Date] Max duration: [1-2 weeks]
Decision rules:
If success threshold is met: [next step]
If kill threshold is hit: [pivot or stop]
If results are ambiguous: [follow-up action]Desirability Experiments
Question answered: Do customers want this? Will they change behavior to get it?
Desirability is usually the first assumption to test. If nobody wants it, viability and feasibility are irrelevant.
Customer Interview (Switch Interview)
Best for early-stage discovery when you have not yet narrowed to a solution.
When to use: Before building anything. You want to understand struggle and pull.
Setup:
- 45-60 min per session
- Recruit people who have recently done the behavior you're studying (within 90 days)
- Use the timeline reconstruction method (see SKILL.md)
Sample size: 5-8 participants per customer segment. Stop when you stop hearing new themes (theoretical saturation).
Signal: Frequency and intensity of struggle mentions. Look for:
- Workarounds ("I end up having to...")
- Switch stories ("I switched from X because...")
- Emotional language ("It drives me crazy when...")
Kill signal: 0 of 5 participants express the struggle you hypothesized. Revisit your opportunity framing before testing a solution.
Template - Key questions:
JTBD Interview Guide
====================
Opening (5 min):
"Tell me a bit about your role and how you [domain area] day-to-day."
Timeline reconstruction (20 min):
"Walk me through the last time you [target behavior].
Start from the moment you first realized you needed to do it."
Dig into struggle (15 min):
"What had you tried before that didn't work?"
"What's the most frustrating part of the current process?"
"Have you ever paid for something to help with this? What happened?"
Outcomes and anxieties (10 min):
"What would your ideal world look like after this is solved?"
"What would make you nervous about switching to something new?"
Wrap (5 min):
"Is there anything about this topic I haven't asked that seems important?"Smoke Test / Fake Door
Best for testing whether users will take a concrete action toward a not-yet-built feature.
When to use: After qualitative interviews reveal a pattern. You want to quantify intent with actual behavior, not reported preference.
Setup:
- Add a UI element (button, menu item, landing page) for the non-existent feature
- When a user clicks/signs up, show a "coming soon" screen and optionally collect email
- Track click-through rate and, if collecting emails, conversion to sign-up
Sample size: 200-500 unique visitors (or 2 weeks of live traffic, whichever comes first)
Success metric: Click-through rate on the fake door CTA
Typical benchmarks:
| CTR range | Signal |
|---|---|
| < 5% | Weak demand - revisit the opportunity |
| 5-15% | Moderate - investigate with follow-up interviews |
| 15-30% | Strong signal - proceed to solution design |
| > 30% | Very strong - prioritize immediately |
Kill threshold: Define before launch. Example: "If CTR < 8% after 300 visitors, we pause this solution and explore alternative opportunities."
Template - Fake door copy:
Fake Door Copy Template
=======================
Headline: [Specific outcome promise, not feature name]
e.g., "Export reports directly to your accountant in one click"
Subhead: [Concrete evidence of value]
e.g., "Save 2 hours every month on manual data entry"
CTA: [Action-oriented, specific]
e.g., "Turn on auto-export" (not "Learn more")
Post-click: "This feature is coming soon. We're building it now.
Leave your email to be first in line."Concept Test Survey
Best for quantifying a pattern already discovered qualitatively.
When to use: After 5+ interviews reveal a theme. You want to know "how many?" before committing engineering.
Setup:
- 5-8 questions max
- Lead with the struggle, then present the solution concept (text or low-fi image)
- Measure: problem frequency, current solution satisfaction, willingness to switch
Sample size: 100-300 responses (email list, in-app prompt, or panel)
Template - Core questions:
Survey Template
===============
Q1 (Frequency): How often do you [target struggle]?
[ ] Daily [ ] Weekly [ ] Monthly [ ] Rarely [ ] Never
Q2 (Pain intensity): How much of a problem is [struggle] for you?
1 (Not a problem) ---- 5 (Major problem)
Q3 (Concept): [2-sentence description of solution concept]
"Imagine if [product] could [value prop]. You would [specific action]."
Q4 (Desirability): How useful would this be for you?
1 (Not useful) ---- 5 (Extremely useful)
Q5 (Open): What would you need to see to trust this feature?
[Free text]Success threshold: >= 40% rate the concept 4+ on usefulness AND struggle is 4+ intensity. Lower than this suggests the opportunity needs reframing.
Viability Experiments
Question answered: Can we build a sustainable business around this? Will customers pay, and at what price?
Willingness-to-Pay Interview
Best for B2B or prosumer products where pricing is a key unknown.
When to use: Before pricing a new tier or feature. After desirability is established.
Setup:
- 30-45 min interview
- Show a working prototype or detailed description
- Use the Van Westendorp Price Sensitivity Meter
Sample size: 15-25 participants (B2B); 30-50 (B2C)
Van Westendorp questions:
Van Westendorp Price Sensitivity
=================================
After showing the product/feature:
Q1 (Too cheap): "At what price would this be so cheap that
you'd question the quality?"
Q2 (Cheap but acceptable): "At what price would this be a
bargain - great value for money?"
Q3 (Getting expensive): "At what price would this start to
feel expensive, but you'd still consider it?"
Q4 (Too expensive): "At what price would this be so expensive
that you wouldn't consider buying it?"
Plot responses to find:
- Acceptable price range: intersection of "too cheap" and "too expensive" curves
- Optimal price point: intersection of "cheap but acceptable" and "getting expensive"Kill signal: Optimal price point is below your required revenue per customer. Revisit the value proposition or target segment before investing further.
Fake Pricing Page
Best for SaaS products testing tier structure and price sensitivity at scale.
Setup:
- Build a pricing page with real prices before the feature exists
- Add a CTA that collects intent (e.g., "Start free trial" or "Upgrade now")
- Intercept post-click with "coming soon" + waitlist signup
- Track: which tier gets the most clicks, overall conversion rate
Sample size: 500+ unique visitors or 3 weeks of traffic
Success metric: Conversion rate to "upgrade intent" click
Kill threshold: < 2% conversion on target tier after 500 visitors means the price or tier structure needs rework.
Feasibility Experiments
Question answered: Can we actually build this? Do we have the data, APIs, compute, or skills required?
Technical Spike
Best for validating novel technical integrations or data assumptions.
When to use: When engineering believes there is meaningful uncertainty about whether something can be built, or built within acceptable performance bounds.
Setup:
- Time-box to 2-5 days
- One engineer, one specific question to answer
- Produce a decision artifact (code snippet, benchmark result, or written finding)
Template:
Technical Spike Brief
=====================
Question to answer:
[One specific technical question]
e.g., "Can we classify transaction categories with > 90% accuracy
using only the merchant name field from our existing data?"
Definition of done:
[Concrete deliverable]
e.g., "A notebook showing precision/recall on a sample of 1,000
real transactions from our production database"
Time box: [2-5 days]
Success signal: [Threshold to proceed]
e.g., "> 85% precision on unseen test set"
Kill signal: [Threshold to abandon this technical approach]
e.g., "< 70% precision OR requires data we don't have access to"Data Audit
Best for assumption that existing behavioral data can power a feature.
Setup:
- Define the exact data fields the solution requires
- Query existing data sources to measure completeness, quality, and availability
- Assess: coverage rate, recency, and accuracy on a sample
Template:
Data Audit Checklist
====================
Required data fields:
[ ] Field 1: [name, source, expected format]
[ ] Field 2: ...
For each field, measure:
Coverage rate: [% of records with this field populated]
Recency: [age of most recent record for typical user]
Accuracy check: [% correct on manually verified sample of 50]
Pass criteria: All required fields >= 80% coverage and >= 85% accuracy
Fail criteria: Any required field < 60% coverage -> infeasible as designedUsability Experiments
Question answered: Can customers use this without friction? Can they complete the target task independently?
Think-Aloud Prototype Test
Best for testing navigation flows, information architecture, and task completion.
When to use: After a solution is designed but before engineering begins.
Setup:
- Prepare tasks (not leading questions - "find the export option", not "click export")
- Use Figma prototype, coded component, or paper sketch
- Record session (with consent)
Sample size: 5 participants reveals ~85% of usability issues. Run 5, fix the biggest issues, then run 5 more.
Facilitation template:
Think-Aloud Facilitation Guide
================================
Introduction (3 min):
"We're testing the design, not you. There are no wrong answers.
Please think out loud as you navigate - tell me what you're
looking at, what you're thinking, and what you expect to happen."
Per task:
Task prompt: [Action to complete without revealing how]
Observe: [Record hesitations, errors, and recovery attempts]
Post-task: "What did you expect to happen there?"
"What would you do next if this were real?"
Debrief (5 min):
"What was most confusing?"
"What felt natural or familiar?"
"Is there anything you expected to see that was missing?"Metrics to track:
| Metric | How to measure | Target |
|---|---|---|
| Task completion rate | % of participants who complete without help | >= 80% |
| Time on task | Seconds from task start to completion | Benchmark vs. current state |
| Error rate | Number of wrong clicks before correction | <= 2 per task |
| Confusion moments | Count of pauses > 5 sec or backtracking | 0 per critical path |
Kill threshold: < 60% task completion rate OR 3+ participants cannot complete the primary task = redesign before engineering.
First-Click Test
Best for testing information architecture and labeling decisions quickly.
Setup:
- Show a static screenshot or wireframe
- Ask: "Where would you click to [accomplish task]?"
- Measure: where users click and how long it takes (< 5 sec = confident, > 15 sec = confused)
Sample size: 30-50 participants (can be done async via Maze, Optimal Workshop)
Success threshold: >= 70% correct first click on the target element.
Kill threshold: < 50% correct first click = rename the element or restructure navigation.
Experiment Selection Guide
Use this table to quickly choose the right experiment for your context:
| Assumption type | Time available | Best experiment | Min sample |
|---|---|---|---|
| Desirability | 1-3 days | 5 customer interviews | 5 people |
| Desirability | 1-2 weeks | Smoke test / fake door | 200 visitors |
| Desirability | 1 week | Concept test survey | 100 responses |
| Viability | 2-3 days | Willingness-to-pay interview | 15 people |
| Viability | 2-3 weeks | Fake pricing page | 500 visitors |
| Feasibility | 2-5 days | Technical spike | 1 engineer |
| Feasibility | 1 day | Data audit | Existing data |
| Usability | 1-3 days | Think-aloud test (5 users) | 5 people |
| Usability | 1 week | First-click test (async) | 30 people |
Decision rule: Always start with the fastest, cheapest experiment that could change your mind. Only escalate to a more expensive experiment if the cheaper one produces an ambiguous result.
jtbd-framework.md
Jobs-to-be-Done Framework
Job statement syntax
A well-formed job statement follows this structure:
When [situation/context],
I want to [motivation/desired progress],
so I can [expected outcome].Rules for writing job statements:
- The situation must describe a real, recurring circumstance - not a one-off event
- The motivation must be solution-agnostic - describe the progress, not a feature
- The outcome should connect to a functional, emotional, or social benefit
- The statement should be testable: you can observe whether someone is in this situation
- Aim for the "Goldilocks level" - not too abstract ("live better") and not too specific ("click a button")
Example - good vs. bad:
| Quality | Statement |
|---|---|
| Too abstract | "When managing my life, I want to feel less stressed, so I can be happy." |
| Too specific | "When I open the app, I want to see a pie chart, so I can check my categories." |
| Just right | "When I receive my monthly paycheck, I want to quickly allocate it across bills, savings, and discretionary spending, so I can avoid overdrafts and make progress toward my savings goal." |
The three layers of a job
Every job has functional, emotional, and social dimensions. Interviewing for all three produces richer insight than functional alone.
Functional job
The practical task the customer needs to accomplish. This is what most teams focus on and what product requirements typically capture.
- Directly observable
- Measured by speed, accuracy, thoroughness
- Example: "Reconcile bank transactions with accounting records"
Emotional job
How the customer wants to feel during and after completing the job. Often the real driver of switching behavior - people leave products that make them feel anxious or incompetent even if the product technically works.
- Not directly observable - must be inferred from language and behavior
- Measured by confidence, satisfaction, peace of mind
- Example: "Feel confident that I am not missing any expenses at tax time"
Social job
How the customer wants to be perceived by others. Social jobs are especially powerful in B2B contexts where purchasing decisions are visible to colleagues and managers.
- Context-dependent - the same person has different social jobs at work vs. home
- Measured by perception, status, belonging
- Example: "Be seen by my CFO as someone who has the financials under control"
Switch interviews (forces of progress)
Switch interviews are retrospective interviews conducted after someone has recently switched to or from a product. They reveal the four forces that drive or inhibit switching:
DEMAND SIDE (pulls toward switch)
================================
Push of current + Pull of new solution
situation (attraction)
(dissatisfaction)
vs.
SUPPLY SIDE (resists switch)
============================
Anxiety of new + Habit of current
solution situation
(uncertainty) (inertia)The four forces
| Force | Direction | Interview question |
|---|---|---|
| Push | Toward switching | "What was going wrong with your old way of doing this?" |
| Pull | Toward switching | "What attracted you to the new solution? What did you hope it would do?" |
| Anxiety | Against switching | "What concerns did you have before switching? What almost stopped you?" |
| Habit | Against switching | "What did you like about the old way? What was comfortable about it?" |
Running a switch interview
- Recruit recent switchers - People who switched within the last 90 days. Memory fades quickly; the emotional context disappears first.
- Timeline first - Map the chronological journey: first thought of switching, research phase, decision moment, first use, ongoing use. Get exact dates when possible.
- Probe each force - At each phase, ask about pushes, pulls, anxieties, and habits.
- Look for the "struggling moment" - The specific event that moved someone from passive dissatisfaction to active searching. This is your highest-value insight.
- Document the forces diagram - After the interview, fill in the four-forces diagram with direct quotes from the participant.
Patterns to look for across interviews
- Consistent pushes across 4+ participants indicate a market-level problem, not an individual complaint
- Strong anxieties suggest your marketing and onboarding must address specific fears
- Weak pulls mean your value proposition is not landing - people are leaving the old solution, not choosing yours
- Strong habits indicate you need migration tools, data import, or familiar UX patterns
Outcome-driven innovation (ODI)
ODI, developed by Tony Ulwick, quantifies jobs-to-be-done by measuring customer-defined outcomes. It produces a numeric score for each outcome that reveals where customers are over-served (table stakes) and under-served (opportunities).
ODI process
- Define the job - Use the job statement syntax above
- Map the job steps - Break the job into 5-15 sequential steps (the "job map")
- Extract desired outcomes - For each step, ask: "What does success look like?"
Write outcomes in the format:
orMinimize the [time/likelihood/effort] of [undesired outcome]Increase the [speed/accuracy/reliability] of [desired outcome] - Survey customers - For each outcome, ask two questions on a 1-5 scale:
- "How important is this outcome to you?" (importance)
- "How satisfied are you with how well current solutions achieve this?" (satisfaction)
- Calculate the opportunity score:
Opportunity = Importance + max(Importance - Satisfaction, 0)- Score > 12: Under-served (high opportunity)
- Score 10-12: Appropriately served
- Score < 10: Over-served (table stakes, do not invest)
Job map template
A job map captures the universal steps a customer goes through when executing a job, regardless of which solution they use:
| Step | Description | Example (personal budgeting) |
|---|---|---|
| 1. Define | Determine goals and plan approach | Decide savings target and budget categories |
| 2. Locate | Gather needed inputs and resources | Collect bank statements, receipts, income records |
| 3. Prepare | Set up the environment for execution | Open spreadsheet/app, set up categories |
| 4. Confirm | Verify readiness before proceeding | Check that all transactions are imported |
| 5. Execute | Perform the core task | Categorize transactions, allocate amounts |
| 6. Monitor | Track progress during execution | Watch running totals, compare to budget |
| 7. Modify | Make adjustments based on monitoring | Reallocate from one category to another |
| 8. Conclude | Finish and document the output | Save the budget, set reminders for next period |
Each step generates 3-8 desired outcomes. A complete job map typically produces 50-150 outcomes to survey.
When to use which JTBD technique
| Technique | Best for | Effort | Output |
|---|---|---|---|
| Job statement writing | Framing the problem space | Low (1-2 hours) | Shared language for the team |
| Switch interviews | Understanding why people change solutions | Medium (5-8 interviews) | Forces diagram, struggling moments |
| ODI scoring | Quantitative prioritization of outcomes | High (survey of 100+ users) | Ranked opportunity scores |
| Job mapping | Detailed process understanding | Medium (workshop + interviews) | Step-by-step outcome list |
opportunity-solution-trees.md
Opportunity Solution Trees
The opportunity solution tree (OST) is a visual decision-making tool created by Teresa Torres as part of the continuous discovery framework. It connects a desired product outcome to customer-discovered opportunities, candidate solutions, and assumption tests. The tree makes the team's thinking visible and ensures that every feature can be traced back to real customer evidence.
Tree structure
[Desired Outcome]
|
+-- [Opportunity 1]
| +-- [Solution 1a]
| | +-- [Assumption test 1]
| | +-- [Assumption test 2]
| +-- [Solution 1b]
| +-- [Assumption test 3]
|
+-- [Opportunity 2]
| +-- [Solution 2a]
| +-- [Solution 2b]
| +-- [Solution 2c]
|
+-- [Opportunity 3]
+-- [Solution 3a]
+-- [Assumption test 4]Each level of the tree has specific rules:
Level 1: Desired outcome
The root of the tree is a single, measurable product outcome that the team has been assigned to improve.
Rules for good outcomes:
- Must be a metric the product team can directly influence (not revenue or NPS alone)
- Must be measurable with existing or achievable instrumentation
- Should be a leading indicator, not a lagging one
- Must have a current baseline and a target
Good outcomes:
- "Increase 7-day activation rate from 30% to 50%"
- "Reduce time-to-first-value from 12 minutes to under 3 minutes"
- "Increase weekly active usage from 2.1 to 3.5 sessions per user"
Bad outcomes:
- "Increase revenue" (too broad, team cannot directly influence)
- "Make users happy" (not measurable)
- "Ship the new dashboard" (this is an output, not an outcome)
Level 2: Opportunities
Opportunities are unmet customer needs, pain points, or desires discovered through research. They represent the problem space - the "why" behind the metric.
Where opportunities come from
| Source | How to extract opportunities |
|---|---|
| Customer interviews | Identify struggling moments, workarounds, and unmet needs |
| Support tickets | Cluster recurring complaints into themes |
| Behavioral analytics | Find drop-off points, rage clicks, abandoned flows |
| Sales call recordings | Note objections and feature requests; reframe as underlying needs |
| Usability tests | Identify tasks users cannot complete or complete with difficulty |
| NPS/CSAT verbatims | Theme open-text responses by underlying need |
Writing opportunity statements
Use customer language, not internal jargon. An opportunity describes a need, not a solution.
| Quality | Example |
|---|---|
| Bad (solution) | "We need an AI categorizer" |
| Bad (vague) | "Users have trouble with categories" |
| Good (need) | "Users spend too much time manually categorizing transactions" |
| Good (need) | "Users miss important recurring charges because they get buried in the feed" |
Structuring the opportunity space
Opportunities can be nested. A parent opportunity represents a broad theme; child opportunities represent specific aspects of that theme.
Users struggle to understand where their money goes each month
|
+-- Manual transaction categorization takes too long
+-- Recurring charges are hard to identify and track
+-- Spending in one category bleeds into another without warningRules:
- Each parent should have 2-5 children (more means it needs further decomposition)
- Children should be mutually exclusive and collectively exhaustive (MECE) where possible
- Leaf opportunities should be specific enough to generate solution ideas
- Every opportunity must be traceable to at least one piece of customer evidence
Level 3: Solutions
Solutions are ideas for addressing a specific opportunity. They represent the "how."
Solution ideation rules
- Generate at least 3 solutions per opportunity - This prevents premature commitment and single-option bias. Bad ideas in the mix actually improve decision quality by creating contrast.
- Keep solutions lightweight - A sentence or two is enough. No PRDs, no wireframes, no technical specs at this stage.
- Vary the solution types - Include at least one "boring" solution (simple, proven pattern), one "creative" solution (novel approach), and one "remove" solution (what if we eliminated the problem instead of solving it?).
- Score solutions against the target outcome - Ask: "If this solution works perfectly, how much would it move the outcome metric?" Discard solutions with low potential impact.
Solution comparison template
OPPORTUNITY: [statement]
TARGET OUTCOME: [metric]
| Criteria | Solution A | Solution B | Solution C |
|-------------------|-------------------|-------------------|-------------------|
| Impact on outcome | High / Med / Low | High / Med / Low | High / Med / Low |
| Reach | All / Segment | All / Segment | All / Segment |
| Confidence | High / Med / Low | High / Med / Low | High / Med / Low |
| Effort | S / M / L / XL | S / M / L / XL | S / M / L / XL |
| Riskiest assumption| [what] | [what] | [what] |Level 4: Assumption tests (experiments)
Each solution carries assumptions. The tree branches into experiments that test the riskiest assumptions with the cheapest possible method.
From solution to experiment
- List all assumptions for the chosen solution (use the four categories: desirability, viability, feasibility, usability)
- Identify the one assumption that, if wrong, would make the entire solution worthless
- Design the cheapest experiment that produces a clear yes/no signal on that assumption
- Define success criteria and kill criteria before running the experiment
- Run the experiment, update the tree with the result, and decide: pursue, pivot, or stop
See references/assumption-testing.md for the full experiment catalog.
Maintaining the tree over time
Weekly update cadence
The product trio should update the OST weekly during their continuous discovery ritual:
- Add new opportunities from this week's customer touchpoints
- Retire invalidated solutions based on experiment results
- Promote validated solutions to the delivery backlog
- Rebalance the tree - if one branch is getting all the attention, check whether other high-priority opportunities are being neglected
Tree health checks
Run these checks monthly:
| Check | Healthy sign | Unhealthy sign |
|---|---|---|
| Evidence freshness | All opportunities cite evidence from last 90 days | Opportunities based on assumptions or ancient data |
| Solution diversity | 3+ solutions per active opportunity | Single solution committed without comparison |
| Experiment velocity | 1-2 experiments completed per week | No experiments run in the last month |
| Outcome connection | Every opportunity clearly ties to the outcome metric | Orphan opportunities with no clear metric link |
| Balanced exploration | Multiple branches being explored | All effort concentrated on one branch |
Archiving, not deleting
When an experiment invalidates a solution:
- Move the solution branch to an "archived" section of the tree
- Add a note: what was tested, what the result was, and what was learned
- The learning prevents future teams from re-testing the same failed assumption
Common OST anti-patterns
| Anti-pattern | Symptom | Fix |
|---|---|---|
| Solution-first tree | Solutions appear without parent opportunities | Interview customers; opportunities must come from evidence |
| Single-branch tree | Only one opportunity with one solution | Brainstorm harder; you are not exploring the space |
| Stale tree | No updates in 3+ weeks | Re-establish weekly cadence; the tree reflects active discovery |
| Output-as-outcome | Root node is "Ship feature X" | Replace with a measurable behavior change metric |
| Kitchen-sink tree | 20+ opportunities, none deeply explored | Prioritize 2-3 opportunities; depth beats breadth |
Frequently Asked Questions
What is product-discovery?
Use this skill when applying Jobs-to-be-Done, building opportunity solution trees, mapping assumptions, or validating product ideas. Triggers on product discovery, JTBD, jobs-to-be-done, opportunity solution trees, assumption mapping, experiment design, prototype testing, and any task requiring product discovery methodology.
How do I install product-discovery?
Run npx skills add AbsolutelySkilled/AbsolutelySkilled --skill product-discovery in your terminal. The skill will be immediately available in your AI coding agent.
What AI agents support product-discovery?
product-discovery works with claude-code, gemini-cli, openai-codex. Install it once and use it across any supported AI coding agent.