product-discovery

Use this skill when applying Jobs-to-be-Done, building opportunity solution trees, mapping assumptions, or validating product ideas. Triggers on product discovery, JTBD, jobs-to-be-done, opportunity solution trees, assumption mapping, experiment design, prototype testing, and any task requiring product discovery methodology.

What is product-discovery?

Quick Start

Open your terminal or command prompt
Run: npx skills add AbsolutelySkilled/AbsolutelySkilled --skill product-discovery
Start your AI coding agent (Claude Code, Cursor, Gemini CLI, or any supported agent)
The product-discovery skill is now active and ready to use

Overview Files

product-discovery

product-discovery is a production-ready AI agent skill for claude-code, gemini-cli, openai-codex. Applying Jobs-to-be-Done, building opportunity solution trees, mapping assumptions, or validating product ideas.

Quick Facts

Field	Value
Category	product
Version	0.1.0
Platforms	claude-code, gemini-cli, openai-codex
License	MIT

How to Install

Make sure you have Node.js installed on your machine.
Run the following command in your terminal:

npx skills add AbsolutelySkilled/AbsolutelySkilled --skill product-discovery

The product-discovery skill is now available in your AI coding agent (Claude Code, Gemini CLI, OpenAI Codex, etc.).

Overview

Product discovery is the ongoing practice of learning what to build before - and while - building it. The goal is to reduce risk: shipping the wrong thing is far more expensive than the research that would have prevented it. This skill covers Jobs-to-be-Done (JTBD), opportunity solution trees, assumption mapping, experiment design, and prototype testing - giving an agent the judgment to run rigorous discovery the way a senior product manager or product trio would.

Platforms

claude-code
gemini-cli
openai-codex

Related Skills

Pair product-discovery with these complementary skills:

Frequently Asked Questions

What is product-discovery?

How do I install product-discovery?

Run npx skills add AbsolutelySkilled/AbsolutelySkilled --skill product-discovery in your terminal. The skill will be immediately available in your AI coding agent.

What AI agents support product-discovery?

This skill works with claude-code, gemini-cli, openai-codex. Install it once and use it across any supported AI coding agent.

Maintainers

@maddhruv

Generated from AbsolutelySkilled

SKILL.md

Product Discovery

When to use this skill

Trigger this skill when the user:

Asks how to apply Jobs-to-be-Done or conduct JTBD interviews
Wants to build or review an opportunity solution tree
Needs to map, categorize, or prioritize assumptions
Is designing an experiment, A/B test, or validation study
Wants to run or evaluate prototype tests (concept, usability, or value)
Asks how to synthesize qualitative or quantitative discovery data
Needs to establish a discovery cadence or dual-track workflow
Is deciding between multiple product bets or solution directions

Do NOT trigger this skill for:

Pure delivery execution (sprint planning, story writing, velocity - use agile-scrum)
Growth hacking or marketing experimentation (use a growth or marketing skill)

Key principles

Discover continuously, not in phases - Discovery is not a gate before delivery. It runs in parallel with shipping. Every sprint produces both validated learning and working software. "Done with discovery" is a warning sign.
Outcomes over outputs - The goal is a measurable change in customer behavior, not a feature shipped. Define success as a behavioral outcome first; the solution is just a hypothesis about how to reach it.
Test assumptions, not ideas - Every solution idea rests on a stack of assumptions. Surface the riskiest ones first and test those - not the idea in its entirety. This collapses validation time by 10x.
Smallest experiment possible - Always ask: "What is the cheapest, fastest way to learn whether this assumption is true?" A 5-minute interview, a smoke test, or a paper prototype can invalidate months of engineering work.
Dual-track: discovery and delivery in parallel - One track discovers the next problem worth solving; the other delivers on already-validated solutions. Teams that separate these into sequential phases go dark on learning for months at a time.

Core concepts

JTBD Framework

Jobs-to-be-Done treats customer behavior as hiring a product to do a job. The canonical JTBD statement is:

"When [situation], I want to [motivation], so I can [expected outcome]."

Jobs have three layers:

Functional job - The practical task (file my taxes quickly)
Emotional job - How the customer wants to feel (confident I won't get audited)
Social job - How they want to be perceived (look responsible to my partner)

Strong solutions address all three layers. Most competitors only address the functional job, leaving emotional and social value uncaptured.

Interview for jobs by asking about the last time the customer did the relevant behavior - not hypotheticals. "Tell me about the last time you..." surfaces actual pull, struggle, and workaround data.

Opportunity Solution Trees

The opportunity solution tree (OST) - developed by Teresa Torres - is a visual tool that maps the path from a desired outcome to the experiments that test candidate solutions.

Desired Outcome
  +-- Opportunity 1 (unmet need / pain / desire)
  |     +-- Solution A
  |     |     +-- Assumption 1 --> Experiment
  |     |     +-- Assumption 2 --> Experiment
  |     +-- Solution B
  |           +-- Assumption 3 --> Experiment
  +-- Opportunity 2
        +-- ...

Key rules:

The root is always an outcome (metric), never a solution
Opportunities are discovered from customers - not invented in the office
Each solution sits below a single opportunity - never jump to solution without an opportunity
Every solution has at least one assumption being actively tested

Assumption Types

Every product bet rests on four categories of assumptions:

Type	Question it answers	Example
Desirability	Do customers want this?	"Users want to share playlists with non-subscribers"
Viability	Can we make money from it?	"Enterprise customers will pay $50/seat for SSO"
Feasibility	Can we build it?	"We can infer intent from existing event data"
Usability	Can customers use it without friction?	"Users can complete onboarding without a tooltip"

Prioritize assumptions by: risk x proximity to a decision. Test the assumption that, if wrong, would kill the bet - before testing assumptions about optimization.

Experiment Hierarchy

From lowest to highest fidelity and cost:

Assumption audit - List and stack-rank assumptions; no customer contact yet
Secondary research - Existing data, competitor analysis, academic studies
Customer interview - 30-60 min; 5-8 participants for a theme to emerge
Survey - Quantifies frequency of a qualitatively discovered pattern
Smoke test / landing page - Measures real intent without building the feature
Wizard of Oz - Manual fulfillment behind a product interface
Prototype test - Simulates the experience at chosen fidelity (paper, lo-fi, hi-fi)
Concierge MVP - Deliver the value manually; learn the job deeply
Technical spike - Validate feasibility assumption with a time-boxed build
A/B test / live experiment - Measures actual behavior change in production

See references/experiment-playbook.md for templates by assumption type.

Common tasks

Conduct JTBD interviews

Framework (45-60 min):

Recruitment - Screen for people who have recently done the behavior you're studying. Recent = within 90 days. Avoid future-intent screening questions.
Timeline reconstruction (20 min) - "Walk me through everything that happened from the moment you first realized you needed [solution category] to the moment you made a decision." Map: first thought, passive looking, active looking, deciding.
Dig into the struggle (15 min) - "What had you tried before? What was unsatisfying? What almost made you not switch?"
Outcomes and anxieties (10 min) - "What were you hoping would be different? What were you worried might not work?"
Wrap (5 min) - "If you could change one thing about [product], what would it be?" Use sparingly - this is ideation, not discovery.

Output: Job stories, struggle patterns, and switch triggers. Theme across 5+ interviews before drawing conclusions.

Build an opportunity solution tree

Start with the outcome - Name the metric the product trio owns this quarter, e.g., "Increase week-2 retention from 42% to 55%."
Generate opportunities from interview data - Each opportunity is an unmet need, pain, or desire expressed by a real customer. Do not invent opportunities in workshops.
Cluster and name - Group related struggles. Name them as customer problems ("I lose context when switching devices"), not solutions ("add cross-device sync").
Select the focus opportunity - Use impact/confidence/ease to compare. Pick one.
Brainstorm solutions - Generate 3+ candidate solutions per opportunity. Quantity over quality at this stage. Include unconventional ideas.
Map assumptions per solution - For each candidate, list what must be true for it to work. Sort by type (desirability/viability/feasibility/usability).
Design one experiment per risky assumption - Smallest test that could change your mind. Assign owner and timeline.

Map and prioritize assumptions

Use a 2x2 matrix: Certainty (known vs. unknown) x Risk (low vs. high).

High risk, low certainty - Test immediately. These are bet-killers.
High risk, high certainty - Monitor. You believe these but should revisit if evidence shifts.
Low risk, low certainty - Research when convenient. Won't kill the bet.
Low risk, high certainty - Ignore for now.

For each risky assumption, write a falsifiable statement: "We believe X. We will know this is true when we see Y. We will know it is false when we see Z."

Design validation experiments

Match the experiment type to the assumption category:

Assumption type	Preferred experiment	Signal to look for
Desirability	Customer interview, smoke test	Pull signals + click-through rate
Viability	Pricing interview, willingness-to-pay study	20%+ "definitely would pay" at target price
Feasibility	Technical spike, data audit	Can be built within X sprints
Usability	Usability test (think-aloud)	Task completion rate, errors, time-on-task

Every experiment needs: hypothesis, method, sample size, success criterion, and a kill threshold - the result that would lead you to abandon the bet.

See references/experiment-playbook.md for detailed templates.

Run prototype tests

Choose fidelity based on what you're testing:

Fidelity	Best for	Tools
Paper / sketch	Flow and information architecture	Pen, Balsamiq
Lo-fi wireframe	Navigation and content hierarchy	Figma (no styling)
Hi-fi mockup	Visual design and emotional response	Figma, Framer
Coded prototype	Interaction quality, performance perception	Storybook, CodeSandbox
Production feature	Behavior change, retention, conversion	Feature flag in prod

Think-aloud protocol: Brief the participant ("we're testing the design, not you"), ask them to narrate thoughts as they navigate, do not hint or help, note confusion and errors, debrief after each task. Five participants reveal ~85% of usability issues.

Synthesize discovery insights

Structure synthesis as: observation - pattern - insight - implication.

Observation - What one customer said or did (raw data)
Pattern - What appeared across multiple customers (theme)
Insight - Why this pattern exists (interpretation)
Implication - What it means for the product (decision input)

Avoid jumping from observation to implication. The missing middle is where discovery adds value over anecdote.

Affinity mapping: Write each observation on its own sticky. Group silently. Name groups as customer problems, not solutions. Rank by frequency and intensity of pain.

Create a discovery cadence for the team

A sustainable cadence for a three-person product trio (PM, designer, engineer):

Cadence	Activity	Time
Weekly	2-3 customer interviews or usability sessions	2-3 hrs
Weekly	Assumption review: what did we learn, what changed?	30 min
Bi-weekly	OST review: update tree with new opportunities and learnings	1 hr
Monthly	Opportunity prioritization: re-rank based on new evidence	1 hr
Quarterly	Outcome review: did we move the metric? What next?	2 hrs

Talking to 2-3 customers per week compounding over a year creates an insurmountable understanding advantage over teams that research in batches.

Anti-patterns

Anti-pattern	Why it's harmful	What to do instead
Big-bang discovery	6-week research phase before a project; team goes dark on learning during delivery	Embed 2-3 interviews per week alongside shipping; discovery never stops
Solution-first OST	Listing features at the root of the tree instead of an outcome	Always start with a measurable outcome metric; solutions are hypotheses
Validation theater	Running research to confirm a decision already made; cherry-picking supporting quotes	Write a kill threshold before the study: the result that would change your mind
Over-fitting to one customer	Pivoting strategy based on feedback from a single vocal customer	Require a pattern across 5+ independent sources before changing direction
Premature high-fidelity	Pixel-perfect prototypes before validating the core job	Match fidelity to the assumption; paper prototypes can kill 80% of bad ideas cheaply
Skipping feasibility	Testing only desirability; engineering discovers a blocker in sprint 3	Include an engineer in discovery; run a technical spike for any novel feasibility assumption

Gotchas

Recruiting interviewees through your own app produces selection bias - Users who respond to an in-app recruitment banner are your most engaged advocates. They will tell you the product is great and suggest incremental improvements. To discover why users churn or never activate, you must recruit from people who did not engage - churned users, trial non-converters, and target-persona non-users. Use external recruitment panels for discovery that needs unbiased signal.
Opportunity solution trees built in workshops produce solutions disguised as opportunities - When teams generate the OST collaboratively in a room, "opportunities" are often features rephrased as problems ("users want a better export experience" is a solution frame, not an opportunity). Real opportunities come from verbatim customer language captured in interviews, not from workshop sticky notes. Build the OST from interview data, not from team hypotheses.
Smoke tests measure intent to click, not willingness to pay or actual use - A high click-through rate on a "coming soon" landing page is a desirability signal, not a conversion signal. Users who click are curious; they have not committed to changing behavior, paying, or integrating the feature into their workflow. Smoke tests invalidate "no one wants this" but do not validate "people will pay and retain."
Using a high-fidelity prototype for flow testing anchors users on visual design - When a prototype looks production-ready, participants comment on button colors and copy instead of navigating authentically and revealing flow problems. For testing information architecture and navigation, deliberately use lo-fi wireframes. Reserve hi-fi prototypes for testing emotional response and design quality.
Kill thresholds defined after the experiment results are in are rationalization, not rigor - If you decide what "failure looks like" after you see the data, you will unconsciously set the threshold to preserve your preferred conclusion. Write the kill threshold - the specific metric result that would cause you to abandon or pivot the bet - in the experiment design document before the study begins.

References

references/experiment-playbook.md - Experiment templates by assumption type with success criteria, sample sizes, and analysis guidance

References

assumption-testing.md

Assumption Testing

This reference covers the full catalog of discovery experiment types, organized by which assumption category they test. Use it to pick the fastest, cheapest experiment for the riskiest assumption on your opportunity solution tree.

The assumption matrix

Before choosing an experiment, classify assumptions on two dimensions:

                         HIGH IMPORTANCE
                    (if wrong, solution fails)
                              |
        Leap-of-faith    |   Known risk
        TEST THESE FIRST  |   Test these second
                              |
   LOW EVIDENCE ----------+---------- STRONG EVIDENCE
                              |
        Irrelevant        |   Table stakes
        Deprioritize      |   Monitor, don't test
                              |
                         LOW IMPORTANCE
                    (if wrong, solution still works)

Focus your experiment budget on the upper-left quadrant: high importance, low evidence. These are leap-of-faith assumptions.

Experiment catalog by assumption type

Desirability experiments

Test whether customers actually want the solution.

Experiment	How it works	Duration	Sample size	Signal strength
Fake door test	Add a button/link for the feature that leads to a "coming soon" page. Measure click rate.	3-7 days	500+ impressions	Medium - shows interest, not commitment
Landing page test	Create a landing page describing the value prop. Measure sign-up or waitlist conversion.	5-14 days	1000+ visitors	Medium-high - real behavior on perceived real offering
Concierge test	Manually deliver the solution to 5-10 users. Observe engagement and willingness to continue.	1-2 weeks	5-10 users	High - real usage, real feedback
Wizard of Oz	Users interact with what looks like a working product, but a human performs the backend work.	1-2 weeks	10-20 users	High - tests full experience without building
Pre-order / LOI	Ask customers to pre-order or sign a letter of intent. Money or commitment on the table.	1-2 weeks	10-50 prospects	Very high - willingness to pay is the strongest signal
One-question survey	Ask existing users: "How disappointed would you be if you could no longer use [feature]?"	2-3 days	100+ responses	Medium - Sean Ellis test for product-market fit

Viability experiments

Test whether the business model works.

Experiment	How it works	Duration	Sample size	Signal strength
Pricing page test	Show different price points to different cohorts and measure conversion.	1-2 weeks	500+ per variant	High - real purchasing behavior
Willingness-to-pay survey	Van Westendorp price sensitivity meter: ask 4 pricing questions to find acceptable range.	3-5 days	50-100 respondents	Medium - stated preference, not revealed
Unit economics model	Build a spreadsheet: CAC, LTV, margin, payback period. Stress-test with pessimistic inputs.	1 day	N/A (analytical)	Low-medium - depends on input quality
Partner/vendor feasibility call	Call potential partners or vendors to confirm terms, pricing, and integration requirements.	1-3 days	3-5 calls	Medium - verbal commitments are soft

Feasibility experiments

Test whether the team can build it.

Experiment	How it works	Duration	Sample size	Signal strength
Technical spike	Engineer builds the riskiest technical component in isolation. Time-boxed.	1-3 days	N/A	High - proves or disproves the technical hypothesis
API prototype	Build a minimal API that handles the core data flow. No UI.	2-5 days	N/A	High - surfaces integration issues early
Third-party evaluation	Evaluate vendor APIs, libraries, or services for the key capability.	1-2 days	3-5 vendors	Medium-high - reveals real constraints
Data audit	Check whether the data needed for the solution exists, is accessible, and is clean enough.	1 day	N/A	High - data problems kill more features than code problems

Usability experiments

Test whether users can figure out the solution.

Experiment	How it works	Duration	Sample size	Signal strength
Paper prototype test	Sketch the key screens on paper or whiteboard. Walk 5 users through the flow.	1 day	5 users	Medium - tests concept and flow, not visual design
Clickable prototype test	Build a Figma/Framer prototype. Run moderated usability tests with think-aloud protocol.	3-5 days	5-8 users	High - realistic interaction patterns
First-click test	Show users a screen and ask: "Where would you click to [accomplish task]?" Measure accuracy.	1-2 days	20-50 users	Medium - fast signal on navigation and layout
Five-second test	Show a screen for 5 seconds. Ask what the user remembers and what they think it does.	1 day	20-30 users	Low-medium - tests clarity of value proposition
Unmoderated task test	Record users completing specific tasks in the prototype without a facilitator.	3-5 days	10-20 users	Medium-high - natural behavior, larger sample

Designing experiments that can fail

The most common failure mode in discovery is designing experiments that confirm the idea regardless of the result. Every experiment must have:

1. A falsifiable hypothesis

We believe that [specific user segment]
will [specific measurable behavior]
when [specific condition/change is introduced]
because [rationale from customer evidence].

2. Pre-committed success criteria

Define before data collection:

Success threshold: "If >X% of users do Y, we proceed"
Kill threshold: "If <Z% of users do Y, we stop"
Ambiguous zone: "If between Z% and X%, we need more data"

3. Decision rules

Result	Action
Above success threshold	Move solution to delivery backlog; test next riskiest assumption
In ambiguous zone	Design a more targeted experiment; increase sample size
Below kill threshold	Archive the solution; return to the opportunity and try a different solution

Experiment sequencing

Not all assumptions need testing. Use this decision tree:

Is this assumption high-importance?
  |
  NO  --> Skip (even if wrong, solution still works)
  YES --> Do we have strong evidence already?
            |
            YES --> Monitor, don't test
            NO  --> Is it a desirability assumption?
                      |
                      YES --> Test FIRST (if nobody wants it, nothing else matters)
                      NO  --> Test SECOND (viability, feasibility, usability)

General sequencing rule: Desirability > Viability > Usability > Feasibility

Rationale: Desirability failures are the most expensive to discover late. Feasibility failures are the cheapest - engineers usually know within days whether something is buildable. Test in order of "cost of being wrong late."

Worked example

Context: A budgeting app team discovers that users struggle to track subscription charges across multiple payment methods.

Solution idea: Auto-detect and consolidate all recurring charges into a single "Subscriptions" view.

Assumptions identified:

#	Assumption	Category	Importance	Evidence
A1	Users want to see all subscriptions in one place	Desirability	High	Weak (3/6 interviews mentioned it)
A2	We can reliably detect recurring charges from transaction data	Feasibility	High	Weak (untested algorithm)
A3	Users will check the subscription view at least monthly	Desirability	Medium	None
A4	Consolidation reduces churn from the budgeting app	Viability	Medium	None
A5	Users can understand the categorization without explanation	Usability	Medium	None

Testing order:

A1 (desirability, high importance, weak evidence) - Fake door test
A2 (feasibility, high importance, weak evidence) - Technical spike
A5 (usability, medium importance, no evidence) - Paper prototype test
A3 and A4 can be measured post-launch with analytics

Experiment for A1:

EXPERIMENT BRIEF
================
Assumption: Users want to see all subscriptions in one place (A1)

Experiment type: Fake door test

Setup: Add a "Subscriptions" tab to the main navigation for 50% of
users. Clicking it shows a "Coming soon - join the waitlist" modal.

Participants: 50% of active users (A/B split), ~2000 users per arm

Success metric: Waitlist sign-up rate among those who see the tab

Kill criteria: If <3% of users who see the tab click it within 7 days,
the demand signal is too weak to pursue.

Success criteria: If >8% of users click and >40% of clickers join the
waitlist, proceed to feasibility spike (A2).

Timeline: 7 days

Decision:
  If success: Run technical spike for A2
  If kill: Return to opportunity; explore other solution ideas

experiment-playbook.md

Experiment Playbook

Every product experiment should be matched to the type of assumption being tested. Using a high-fidelity experiment for a low-certainty desirability assumption wastes weeks. Using a quick interview for a feasibility question produces noise. This playbook maps assumption categories to the right experiment type with templates, sample sizes, and success criteria.

Universal Experiment Brief Template

Fill this out before running any experiment. Define the kill threshold before collecting data - post-hoc rationalization is the enemy of real learning.

EXPERIMENT BRIEF
================
Assumption:
  [One specific, falsifiable belief you are testing]

Assumption type:
  [ ] Desirability  [ ] Viability  [ ] Feasibility  [ ] Usability

Experiment type:
  [Customer interview / smoke test / survey / prototype test /
   technical spike / wizard of oz / concierge / A/B test]

Setup:
  [What to build or prepare - keep it minimal]

Participants:
  Who:     [Customer segment]
  How many: [Minimum to reach signal]
  Recruited via: [Source]

Success metric:
  [One quantitative signal - task completion rate, click-through, etc.]

Success threshold:
  [e.g., >= 30% click-through, >= 4/5 task completions]

Kill threshold:
  [e.g., < 15% click-through, < 2/5 task completions]

Timeline:
  Start: [Date]  End: [Date]  Max duration: [1-2 weeks]

Decision rules:
  If success threshold is met:  [next step]
  If kill threshold is hit:     [pivot or stop]
  If results are ambiguous:     [follow-up action]

Desirability Experiments

Question answered: Do customers want this? Will they change behavior to get it?

Desirability is usually the first assumption to test. If nobody wants it, viability and feasibility are irrelevant.

Customer Interview (Switch Interview)

Best for early-stage discovery when you have not yet narrowed to a solution.

When to use: Before building anything. You want to understand struggle and pull.

Setup:

45-60 min per session
Recruit people who have recently done the behavior you're studying (within 90 days)
Use the timeline reconstruction method (see SKILL.md)

Sample size: 5-8 participants per customer segment. Stop when you stop hearing new themes (theoretical saturation).

Signal: Frequency and intensity of struggle mentions. Look for:

Workarounds ("I end up having to...")
Switch stories ("I switched from X because...")
Emotional language ("It drives me crazy when...")

Kill signal: 0 of 5 participants express the struggle you hypothesized. Revisit your opportunity framing before testing a solution.

Template - Key questions:

JTBD Interview Guide
====================
Opening (5 min):
  "Tell me a bit about your role and how you [domain area] day-to-day."

Timeline reconstruction (20 min):
  "Walk me through the last time you [target behavior].
   Start from the moment you first realized you needed to do it."

Dig into struggle (15 min):
  "What had you tried before that didn't work?"
  "What's the most frustrating part of the current process?"
  "Have you ever paid for something to help with this? What happened?"

Outcomes and anxieties (10 min):
  "What would your ideal world look like after this is solved?"
  "What would make you nervous about switching to something new?"

Wrap (5 min):
  "Is there anything about this topic I haven't asked that seems important?"

Smoke Test / Fake Door

Best for testing whether users will take a concrete action toward a not-yet-built feature.

When to use: After qualitative interviews reveal a pattern. You want to quantify intent with actual behavior, not reported preference.

Setup:

Add a UI element (button, menu item, landing page) for the non-existent feature
When a user clicks/signs up, show a "coming soon" screen and optionally collect email
Track click-through rate and, if collecting emails, conversion to sign-up

Sample size: 200-500 unique visitors (or 2 weeks of live traffic, whichever comes first)

Success metric: Click-through rate on the fake door CTA

Typical benchmarks:

CTR range	Signal
< 5%	Weak demand - revisit the opportunity
5-15%	Moderate - investigate with follow-up interviews
15-30%	Strong signal - proceed to solution design
> 30%	Very strong - prioritize immediately

Kill threshold: Define before launch. Example: "If CTR < 8% after 300 visitors, we pause this solution and explore alternative opportunities."

Template - Fake door copy:

Fake Door Copy Template
=======================
Headline:   [Specific outcome promise, not feature name]
            e.g., "Export reports directly to your accountant in one click"

Subhead:    [Concrete evidence of value]
            e.g., "Save 2 hours every month on manual data entry"

CTA:        [Action-oriented, specific]
            e.g., "Turn on auto-export" (not "Learn more")

Post-click: "This feature is coming soon. We're building it now.
             Leave your email to be first in line."

Concept Test Survey

Best for quantifying a pattern already discovered qualitatively.

When to use: After 5+ interviews reveal a theme. You want to know "how many?" before committing engineering.

Setup:

5-8 questions max
Lead with the struggle, then present the solution concept (text or low-fi image)
Measure: problem frequency, current solution satisfaction, willingness to switch

Sample size: 100-300 responses (email list, in-app prompt, or panel)

Template - Core questions:

Survey Template
===============
Q1 (Frequency): How often do you [target struggle]?
  [ ] Daily  [ ] Weekly  [ ] Monthly  [ ] Rarely  [ ] Never

Q2 (Pain intensity): How much of a problem is [struggle] for you?
  1 (Not a problem) ---- 5 (Major problem)

Q3 (Concept): [2-sentence description of solution concept]
  "Imagine if [product] could [value prop]. You would [specific action]."

Q4 (Desirability): How useful would this be for you?
  1 (Not useful) ---- 5 (Extremely useful)

Q5 (Open): What would you need to see to trust this feature?
  [Free text]

Success threshold: >= 40% rate the concept 4+ on usefulness AND struggle is 4+ intensity. Lower than this suggests the opportunity needs reframing.

Viability Experiments

Question answered: Can we build a sustainable business around this? Will customers pay, and at what price?

Willingness-to-Pay Interview

Best for B2B or prosumer products where pricing is a key unknown.

When to use: Before pricing a new tier or feature. After desirability is established.

Setup:

30-45 min interview
Show a working prototype or detailed description
Use the Van Westendorp Price Sensitivity Meter

Sample size: 15-25 participants (B2B); 30-50 (B2C)

Van Westendorp questions:

Van Westendorp Price Sensitivity
=================================
After showing the product/feature:

Q1 (Too cheap): "At what price would this be so cheap that
    you'd question the quality?"

Q2 (Cheap but acceptable): "At what price would this be a
    bargain - great value for money?"

Q3 (Getting expensive): "At what price would this start to
    feel expensive, but you'd still consider it?"

Q4 (Too expensive): "At what price would this be so expensive
    that you wouldn't consider buying it?"

Plot responses to find:
- Acceptable price range: intersection of "too cheap" and "too expensive" curves
- Optimal price point: intersection of "cheap but acceptable" and "getting expensive"

Kill signal: Optimal price point is below your required revenue per customer. Revisit the value proposition or target segment before investing further.

Fake Pricing Page

Best for SaaS products testing tier structure and price sensitivity at scale.

Setup:

Build a pricing page with real prices before the feature exists
Add a CTA that collects intent (e.g., "Start free trial" or "Upgrade now")
Intercept post-click with "coming soon" + waitlist signup
Track: which tier gets the most clicks, overall conversion rate

Sample size: 500+ unique visitors or 3 weeks of traffic

Success metric: Conversion rate to "upgrade intent" click

Kill threshold: < 2% conversion on target tier after 500 visitors means the price or tier structure needs rework.

Feasibility Experiments

Question answered: Can we actually build this? Do we have the data, APIs, compute, or skills required?

Technical Spike

Best for validating novel technical integrations or data assumptions.

When to use: When engineering believes there is meaningful uncertainty about whether something can be built, or built within acceptable performance bounds.

Setup:

Time-box to 2-5 days
One engineer, one specific question to answer
Produce a decision artifact (code snippet, benchmark result, or written finding)

Template:

Technical Spike Brief
=====================
Question to answer:
  [One specific technical question]
  e.g., "Can we classify transaction categories with > 90% accuracy
         using only the merchant name field from our existing data?"

Definition of done:
  [Concrete deliverable]
  e.g., "A notebook showing precision/recall on a sample of 1,000
         real transactions from our production database"

Time box: [2-5 days]

Success signal: [Threshold to proceed]
  e.g., "> 85% precision on unseen test set"

Kill signal: [Threshold to abandon this technical approach]
  e.g., "< 70% precision OR requires data we don't have access to"

Data Audit

Best for assumption that existing behavioral data can power a feature.

Setup:

Define the exact data fields the solution requires
Query existing data sources to measure completeness, quality, and availability
Assess: coverage rate, recency, and accuracy on a sample

Template:

Data Audit Checklist
====================
Required data fields:
  [ ] Field 1: [name, source, expected format]
  [ ] Field 2: ...

For each field, measure:
  Coverage rate:  [% of records with this field populated]
  Recency:        [age of most recent record for typical user]
  Accuracy check: [% correct on manually verified sample of 50]

Pass criteria: All required fields >= 80% coverage and >= 85% accuracy
Fail criteria: Any required field < 60% coverage -> infeasible as designed

Usability Experiments

Question answered: Can customers use this without friction? Can they complete the target task independently?

Think-Aloud Prototype Test

Best for testing navigation flows, information architecture, and task completion.

When to use: After a solution is designed but before engineering begins.

Setup:

Prepare tasks (not leading questions - "find the export option", not "click export")
Use Figma prototype, coded component, or paper sketch
Record session (with consent)

Sample size: 5 participants reveals ~85% of usability issues. Run 5, fix the biggest issues, then run 5 more.

Facilitation template:

Think-Aloud Facilitation Guide
================================
Introduction (3 min):
  "We're testing the design, not you. There are no wrong answers.
   Please think out loud as you navigate - tell me what you're
   looking at, what you're thinking, and what you expect to happen."

Per task:
  Task prompt: [Action to complete without revealing how]
  Observe:     [Record hesitations, errors, and recovery attempts]
  Post-task:   "What did you expect to happen there?"
               "What would you do next if this were real?"

Debrief (5 min):
  "What was most confusing?"
  "What felt natural or familiar?"
  "Is there anything you expected to see that was missing?"

Metrics to track:

Metric	How to measure	Target
Task completion rate	% of participants who complete without help	>= 80%
Time on task	Seconds from task start to completion	Benchmark vs. current state
Error rate	Number of wrong clicks before correction	<= 2 per task
Confusion moments	Count of pauses > 5 sec or backtracking	0 per critical path

Kill threshold: < 60% task completion rate OR 3+ participants cannot complete the primary task = redesign before engineering.

First-Click Test

Best for testing information architecture and labeling decisions quickly.

Setup:

Show a static screenshot or wireframe
Ask: "Where would you click to [accomplish task]?"
Measure: where users click and how long it takes (< 5 sec = confident, > 15 sec = confused)

Sample size: 30-50 participants (can be done async via Maze, Optimal Workshop)

Success threshold: >= 70% correct first click on the target element.

Kill threshold: < 50% correct first click = rename the element or restructure navigation.

Experiment Selection Guide

Use this table to quickly choose the right experiment for your context:

Assumption type	Time available	Best experiment	Min sample
Desirability	1-3 days	5 customer interviews	5 people
Desirability	1-2 weeks	Smoke test / fake door	200 visitors
Desirability	1 week	Concept test survey	100 responses
Viability	2-3 days	Willingness-to-pay interview	15 people
Viability	2-3 weeks	Fake pricing page	500 visitors
Feasibility	2-5 days	Technical spike	1 engineer
Feasibility	1 day	Data audit	Existing data
Usability	1-3 days	Think-aloud test (5 users)	5 people
Usability	1 week	First-click test (async)	30 people

Decision rule: Always start with the fastest, cheapest experiment that could change your mind. Only escalate to a more expensive experiment if the cheaper one produces an ambiguous result.

jtbd-framework.md

Jobs-to-be-Done Framework

Job statement syntax

A well-formed job statement follows this structure:

When [situation/context],
I want to [motivation/desired progress],
so I can [expected outcome].

Rules for writing job statements:

The situation must describe a real, recurring circumstance - not a one-off event
The motivation must be solution-agnostic - describe the progress, not a feature
The outcome should connect to a functional, emotional, or social benefit
The statement should be testable: you can observe whether someone is in this situation
Aim for the "Goldilocks level" - not too abstract ("live better") and not too specific ("click a button")

Example - good vs. bad:

Quality	Statement
Too abstract	"When managing my life, I want to feel less stressed, so I can be happy."
Too specific	"When I open the app, I want to see a pie chart, so I can check my categories."
Just right	"When I receive my monthly paycheck, I want to quickly allocate it across bills, savings, and discretionary spending, so I can avoid overdrafts and make progress toward my savings goal."

The three layers of a job

Every job has functional, emotional, and social dimensions. Interviewing for all three produces richer insight than functional alone.

Functional job

The practical task the customer needs to accomplish. This is what most teams focus on and what product requirements typically capture.

Directly observable
Measured by speed, accuracy, thoroughness
Example: "Reconcile bank transactions with accounting records"

Emotional job

How the customer wants to feel during and after completing the job. Often the real driver of switching behavior - people leave products that make them feel anxious or incompetent even if the product technically works.

Not directly observable - must be inferred from language and behavior
Measured by confidence, satisfaction, peace of mind
Example: "Feel confident that I am not missing any expenses at tax time"

Social job

How the customer wants to be perceived by others. Social jobs are especially powerful in B2B contexts where purchasing decisions are visible to colleagues and managers.

Context-dependent - the same person has different social jobs at work vs. home
Measured by perception, status, belonging
Example: "Be seen by my CFO as someone who has the financials under control"

Switch interviews (forces of progress)

Switch interviews are retrospective interviews conducted after someone has recently switched to or from a product. They reveal the four forces that drive or inhibit switching:

                    DEMAND SIDE (pulls toward switch)
                    ================================
    Push of current    +    Pull of new solution
    situation               (attraction)
    (dissatisfaction)

                          vs.

                    SUPPLY SIDE (resists switch)
                    ============================
    Anxiety of new     +    Habit of current
    solution                situation
    (uncertainty)           (inertia)

The four forces

Force	Direction	Interview question
Push	Toward switching	"What was going wrong with your old way of doing this?"
Pull	Toward switching	"What attracted you to the new solution? What did you hope it would do?"
Anxiety	Against switching	"What concerns did you have before switching? What almost stopped you?"
Habit	Against switching	"What did you like about the old way? What was comfortable about it?"

Running a switch interview

Recruit recent switchers - People who switched within the last 90 days. Memory fades quickly; the emotional context disappears first.
Timeline first - Map the chronological journey: first thought of switching, research phase, decision moment, first use, ongoing use. Get exact dates when possible.
Probe each force - At each phase, ask about pushes, pulls, anxieties, and habits.
Look for the "struggling moment" - The specific event that moved someone from passive dissatisfaction to active searching. This is your highest-value insight.
Document the forces diagram - After the interview, fill in the four-forces diagram with direct quotes from the participant.

Patterns to look for across interviews

Consistent pushes across 4+ participants indicate a market-level problem, not an individual complaint
Strong anxieties suggest your marketing and onboarding must address specific fears
Weak pulls mean your value proposition is not landing - people are leaving the old solution, not choosing yours
Strong habits indicate you need migration tools, data import, or familiar UX patterns

Outcome-driven innovation (ODI)

ODI, developed by Tony Ulwick, quantifies jobs-to-be-done by measuring customer-defined outcomes. It produces a numeric score for each outcome that reveals where customers are over-served (table stakes) and under-served (opportunities).

ODI process

Define the job - Use the job statement syntax above
Map the job steps - Break the job into 5-15 sequential steps (the "job map")

Extract desired outcomes - For each step, ask: "What does success look like?" Write outcomes in the format:

Minimize the [time/likelihood/effort] of [undesired outcome]

Increase the [speed/accuracy/reliability] of [desired outcome]

Survey customers - For each outcome, ask two questions on a 1-5 scale:
- "How important is this outcome to you?" (importance)
- "How satisfied are you with how well current solutions achieve this?" (satisfaction)
Calculate the opportunity score:
```
Opportunity = Importance + max(Importance - Satisfaction, 0)
```
- Score > 12: Under-served (high opportunity)
- Score 10-12: Appropriately served
- Score < 10: Over-served (table stakes, do not invest)

Job map template

A job map captures the universal steps a customer goes through when executing a job, regardless of which solution they use:

Step	Description	Example (personal budgeting)
1. Define	Determine goals and plan approach	Decide savings target and budget categories
2. Locate	Gather needed inputs and resources	Collect bank statements, receipts, income records
3. Prepare	Set up the environment for execution	Open spreadsheet/app, set up categories
4. Confirm	Verify readiness before proceeding	Check that all transactions are imported
5. Execute	Perform the core task	Categorize transactions, allocate amounts
6. Monitor	Track progress during execution	Watch running totals, compare to budget
7. Modify	Make adjustments based on monitoring	Reallocate from one category to another
8. Conclude	Finish and document the output	Save the budget, set reminders for next period

Each step generates 3-8 desired outcomes. A complete job map typically produces 50-150 outcomes to survey.

When to use which JTBD technique

Technique	Best for	Effort	Output
Job statement writing	Framing the problem space	Low (1-2 hours)	Shared language for the team
Switch interviews	Understanding why people change solutions	Medium (5-8 interviews)	Forces diagram, struggling moments
ODI scoring	Quantitative prioritization of outcomes	High (survey of 100+ users)	Ranked opportunity scores
Job mapping	Detailed process understanding	Medium (workshop + interviews)	Step-by-step outcome list

opportunity-solution-trees.md

Opportunity Solution Trees

The opportunity solution tree (OST) is a visual decision-making tool created by Teresa Torres as part of the continuous discovery framework. It connects a desired product outcome to customer-discovered opportunities, candidate solutions, and assumption tests. The tree makes the team's thinking visible and ensures that every feature can be traced back to real customer evidence.

Tree structure

[Desired Outcome]
     |
     +-- [Opportunity 1]
     |     +-- [Solution 1a]
     |     |     +-- [Assumption test 1]
     |     |     +-- [Assumption test 2]
     |     +-- [Solution 1b]
     |           +-- [Assumption test 3]
     |
     +-- [Opportunity 2]
     |     +-- [Solution 2a]
     |     +-- [Solution 2b]
     |     +-- [Solution 2c]
     |
     +-- [Opportunity 3]
           +-- [Solution 3a]
                 +-- [Assumption test 4]

Each level of the tree has specific rules:

Level 1: Desired outcome

The root of the tree is a single, measurable product outcome that the team has been assigned to improve.

Rules for good outcomes:

Must be a metric the product team can directly influence (not revenue or NPS alone)
Must be measurable with existing or achievable instrumentation
Should be a leading indicator, not a lagging one
Must have a current baseline and a target

Good outcomes:

"Increase 7-day activation rate from 30% to 50%"
"Reduce time-to-first-value from 12 minutes to under 3 minutes"
"Increase weekly active usage from 2.1 to 3.5 sessions per user"

Bad outcomes:

"Increase revenue" (too broad, team cannot directly influence)
"Make users happy" (not measurable)
"Ship the new dashboard" (this is an output, not an outcome)

Level 2: Opportunities

Opportunities are unmet customer needs, pain points, or desires discovered through research. They represent the problem space - the "why" behind the metric.

Where opportunities come from

Source	How to extract opportunities
Customer interviews	Identify struggling moments, workarounds, and unmet needs
Support tickets	Cluster recurring complaints into themes
Behavioral analytics	Find drop-off points, rage clicks, abandoned flows
Sales call recordings	Note objections and feature requests; reframe as underlying needs
Usability tests	Identify tasks users cannot complete or complete with difficulty
NPS/CSAT verbatims	Theme open-text responses by underlying need

Writing opportunity statements

Use customer language, not internal jargon. An opportunity describes a need, not a solution.

Quality	Example
Bad (solution)	"We need an AI categorizer"
Bad (vague)	"Users have trouble with categories"
Good (need)	"Users spend too much time manually categorizing transactions"
Good (need)	"Users miss important recurring charges because they get buried in the feed"

Structuring the opportunity space

Opportunities can be nested. A parent opportunity represents a broad theme; child opportunities represent specific aspects of that theme.

Users struggle to understand where their money goes each month
  |
  +-- Manual transaction categorization takes too long
  +-- Recurring charges are hard to identify and track
  +-- Spending in one category bleeds into another without warning

Rules:

Each parent should have 2-5 children (more means it needs further decomposition)
Children should be mutually exclusive and collectively exhaustive (MECE) where possible
Leaf opportunities should be specific enough to generate solution ideas
Every opportunity must be traceable to at least one piece of customer evidence

Level 3: Solutions

Solutions are ideas for addressing a specific opportunity. They represent the "how."

Solution ideation rules

Generate at least 3 solutions per opportunity - This prevents premature commitment and single-option bias. Bad ideas in the mix actually improve decision quality by creating contrast.
Keep solutions lightweight - A sentence or two is enough. No PRDs, no wireframes, no technical specs at this stage.
Vary the solution types - Include at least one "boring" solution (simple, proven pattern), one "creative" solution (novel approach), and one "remove" solution (what if we eliminated the problem instead of solving it?).
Score solutions against the target outcome - Ask: "If this solution works perfectly, how much would it move the outcome metric?" Discard solutions with low potential impact.

Solution comparison template

OPPORTUNITY: [statement]
TARGET OUTCOME: [metric]

| Criteria          | Solution A        | Solution B        | Solution C        |
|-------------------|-------------------|-------------------|-------------------|
| Impact on outcome | High / Med / Low  | High / Med / Low  | High / Med / Low  |
| Reach             | All / Segment     | All / Segment     | All / Segment     |
| Confidence        | High / Med / Low  | High / Med / Low  | High / Med / Low  |
| Effort            | S / M / L / XL    | S / M / L / XL    | S / M / L / XL    |
| Riskiest assumption| [what]           | [what]            | [what]            |

Level 4: Assumption tests (experiments)

Each solution carries assumptions. The tree branches into experiments that test the riskiest assumptions with the cheapest possible method.

From solution to experiment

List all assumptions for the chosen solution (use the four categories: desirability, viability, feasibility, usability)
Identify the one assumption that, if wrong, would make the entire solution worthless
Design the cheapest experiment that produces a clear yes/no signal on that assumption
Define success criteria and kill criteria before running the experiment
Run the experiment, update the tree with the result, and decide: pursue, pivot, or stop

See references/assumption-testing.md for the full experiment catalog.

Maintaining the tree over time

Weekly update cadence

The product trio should update the OST weekly during their continuous discovery ritual:

Add new opportunities from this week's customer touchpoints
Retire invalidated solutions based on experiment results
Promote validated solutions to the delivery backlog
Rebalance the tree - if one branch is getting all the attention, check whether other high-priority opportunities are being neglected

Tree health checks

Run these checks monthly:

Check	Healthy sign	Unhealthy sign
Evidence freshness	All opportunities cite evidence from last 90 days	Opportunities based on assumptions or ancient data
Solution diversity	3+ solutions per active opportunity	Single solution committed without comparison
Experiment velocity	1-2 experiments completed per week	No experiments run in the last month
Outcome connection	Every opportunity clearly ties to the outcome metric	Orphan opportunities with no clear metric link
Balanced exploration	Multiple branches being explored	All effort concentrated on one branch

Archiving, not deleting

When an experiment invalidates a solution:

Move the solution branch to an "archived" section of the tree
Add a note: what was tested, what the result was, and what was learned
The learning prevents future teams from re-testing the same failed assumption

Common OST anti-patterns

Anti-pattern	Symptom	Fix
Solution-first tree	Solutions appear without parent opportunities	Interview customers; opportunities must come from evidence
Single-branch tree	Only one opportunity with one solution	Brainstorm harder; you are not exploring the space
Stale tree	No updates in 3+ weeks	Re-establish weekly cadence; the tree reflects active discovery
Output-as-outcome	Root node is "Ship feature X"	Replace with a measurable behavior change metric
Kitchen-sink tree	20+ opportunities, none deeply explored	Prioritize 2-3 opportunities; depth beats breadth

Frequently Asked Questions

What is product-discovery?

How do I install product-discovery?

Run npx skills add AbsolutelySkilled/AbsolutelySkilled --skill product-discovery in your terminal. The skill will be immediately available in your AI coding agent.

What AI agents support product-discovery?

product-discovery works with claude-code, gemini-cli, openai-codex. Install it once and use it across any supported AI coding agent.

Is product-discovery free?

Yes, product-discovery is completely free and open source under the MIT license. Install it with a single command and start using it immediately.

What is the difference between product-discovery and similar tools?

product-discovery is an AI agent skill that teaches your coding agent specialized product management knowledge. Unlike standalone tools, it integrates directly into claude-code, gemini-cli, openai-codex and other AI agents.

Can I use product-discovery with Cursor or Windsurf?

product-discovery works with any AI coding agent that supports the skills protocol, including Claude Code, Cursor, Windsurf, GitHub Copilot, Gemini CLI, and 40+ more.

product-discovery

What is product-discovery?

Quick Start

product-discovery

Quick Facts

How to Install

Overview

Tags

Platforms

Related Skills

Frequently Asked Questions

What is product-discovery?

How do I install product-discovery?

What AI agents support product-discovery?

Maintainers

SKILL.md

Product Discovery

When to use this skill

Key principles

Core concepts

JTBD Framework

Opportunity Solution Trees

Assumption Types

Experiment Hierarchy

Common tasks

Conduct JTBD interviews

Build an opportunity solution tree

Map and prioritize assumptions

Design validation experiments

Run prototype tests

Synthesize discovery insights

Create a discovery cadence for the team

Anti-patterns

Gotchas

References

References

assumption-testing.md

Assumption Testing

The assumption matrix

Experiment catalog by assumption type

Desirability experiments

Viability experiments

Feasibility experiments

Usability experiments

Designing experiments that can fail

1. A falsifiable hypothesis

2. Pre-committed success criteria

3. Decision rules

Experiment sequencing

Worked example

experiment-playbook.md

Experiment Playbook

Universal Experiment Brief Template

Desirability Experiments

Customer Interview (Switch Interview)

Smoke Test / Fake Door

Concept Test Survey

Viability Experiments

Willingness-to-Pay Interview

Fake Pricing Page

Feasibility Experiments

Technical Spike

Data Audit

Usability Experiments

Think-Aloud Prototype Test

First-Click Test

Experiment Selection Guide

jtbd-framework.md

Jobs-to-be-Done Framework

Job statement syntax

The three layers of a job

Functional job

Emotional job

Social job

Switch interviews (forces of progress)

The four forces

Running a switch interview

Patterns to look for across interviews

Outcome-driven innovation (ODI)

ODI process