test-strategy

Use this skill when deciding what to test, choosing between test types, designing a testing strategy, or balancing test coverage. Triggers on test pyramid, unit vs integration vs e2e, contract testing, test coverage strategy, TDD, BDD, testing ROI, and any task requiring testing architecture decisions.

What is test-strategy?

Quick Start

Open your terminal or command prompt
Run: npx skills add AbsolutelySkilled/AbsolutelySkilled --skill test-strategy
Start your AI coding agent (Claude Code, Cursor, Gemini CLI, or any supported agent)
The test-strategy skill is now active and ready to use

Overview Files

test-strategy

test-strategy is a production-ready AI agent skill for claude-code, gemini-cli, openai-codex. Deciding what to test, choosing between test types, designing a testing strategy, or balancing test coverage.

Quick Facts

Field	Value
Category	engineering
Version	0.1.0
Platforms	claude-code, gemini-cli, openai-codex
License	MIT

How to Install

Make sure you have Node.js installed on your machine.
Run the following command in your terminal:

npx skills add AbsolutelySkilled/AbsolutelySkilled --skill test-strategy

The test-strategy skill is now available in your AI coding agent (Claude Code, Gemini CLI, OpenAI Codex, etc.).

Overview

A testing strategy answers three questions: what to test, at what level, and how much. Without a strategy, teams end up with either too many slow, brittle e2e tests or too few tests overall - both are expensive. This skill gives the judgment to design a test suite that provides high confidence, fast feedback, and low maintenance cost.

Platforms

claude-code
gemini-cli
openai-codex

Related Skills

Pair test-strategy with these complementary skills:

Frequently Asked Questions

What is test-strategy?

How do I install test-strategy?

Run npx skills add AbsolutelySkilled/AbsolutelySkilled --skill test-strategy in your terminal. The skill will be immediately available in your AI coding agent.

What AI agents support test-strategy?

This skill works with claude-code, gemini-cli, openai-codex. Install it once and use it across any supported AI coding agent.

Maintainers

@maddhruv

Generated from AbsolutelySkilled

SKILL.md

Test Strategy

When to use this skill

Trigger this skill when the user:

Asks which type of test to write for a given scenario
Wants to design a testing strategy for a new service or feature
Needs to decide between unit, integration, and e2e tests
Asks about test coverage targets or metrics
Wants to implement contract testing between services
Is dealing with flaky tests and needs a remediation plan
Asks about TDD or BDD workflow

Do NOT trigger this skill for:

Writing the actual test code syntax for a specific framework (defer to framework docs)
Performance testing or load testing strategy (separate domain)

Key principles

Test behavior, not implementation - Tests should survive refactoring. If moving logic between private methods breaks your tests, the tests are testing the wrong thing. Test public contracts and observable outcomes.
The Testing Trophy over the pyramid - The classic pyramid (many unit, fewer integration, few e2e) was coined before modern tooling. The Trophy (Kent C. Dodds) weights integration tests most heavily: static analysis at the base, unit tests for isolated logic, integration tests for the bulk of coverage, and a few e2e tests for critical paths.
Fast feedback loops - A test suite that takes 30 minutes to run is a test suite that doesn't get run. Design for speed: unit tests in milliseconds, integration tests in seconds, e2e tests reserved for CI only.
Test at the right level - The cost of a test rises as you move up the stack (slower, more brittle, harder to debug). Test each concern at the lowest level that meaningfully exercises it.
Flaky tests are worse than no tests - A test that sometimes fails trains the team to ignore failures. A flaky test in CI delays every deploy. Fix or delete flaky tests immediately; never tolerate them.

Core concepts

Test types taxonomy

Type	What it tests	Speed	Cost	Use for
Static	Type errors, lint violations	Instant	Near-zero	Type safety, obvious mistakes
Unit	Single function/class in isolation	< 10ms	Low	Pure logic, edge cases, algorithms
Integration	Multiple modules together with real dependencies	100ms-2s	Medium	Service layer, DB queries, API handlers
E2E	Full user journey through deployed stack	5-60s	High	Critical user paths, smoke tests
Contract	API contract between producer and consumer	Seconds	Medium	Microservice boundaries

The Testing Trophy

        /\
       /e2e\           - Few: critical flows only
      /------\
     /  integ  \       - Most: service + DB + API
    /------------\
   /    unit      \    - Some: pure logic and edge cases
  /----------------\
 /     static       \  - Always: types, lint, format
/--------------------\

The key insight is that integration tests give the best ROI for most application code: they test real behavior through real dependencies without the brittleness of e2e tests.

Test doubles

Use the minimum isolation necessary for the test's purpose:

Double	When to use	Risk
Stub	Replace slow/unavailable dependency, return canned data	Low - no behavior coupling
Mock	Verify a side effect was triggered (email sent, event published)	Medium - couples to call signature
Spy	Observe calls without replacing behavior	Medium - couples to call count/args
Fake	Replace infrastructure with working in-memory version	Low - tests real behavior patterns

Prefer fakes for infrastructure (in-memory DB, in-memory queue). Mocks should be reserved for side effects you cannot otherwise observe.

Coverage metrics

Metric	What it measures	When to use
Line coverage	% of lines executed	Baseline floor, not a target
Branch coverage	% of conditional paths taken	Better for logic-heavy code
Mutation coverage	% of introduced bugs caught by tests	Gold standard for test quality

Line coverage above ~80% has diminishing returns and creates perverse incentives. Mutation coverage reveals whether tests actually assert meaningful things.

Common tasks

Choose the right test type - decision matrix

When deciding what level to test something at, apply this logic:

Is this pure logic with no external dependencies?
  YES → Unit test
  NO  → Does it require a real DB / HTTP call / file system?
          YES → Integration test (use real infrastructure or a fast fake)
          NO  → Does it span multiple services or require a browser?
                  YES → E2E test (sparingly)
                  NO  → Integration test

Additional rules:

Cross-service API boundaries → Contract test (Pact or similar)
Complex UI interaction that cannot be tested at component level → E2E
Algorithm with many edge cases → Unit test per edge case + one integration

Design a test suite for a new service

Structure the test suite before writing the first line of code:

Map the test surface - Identify all external I/O: databases, queues, HTTP clients, file system. These are the integration seams.
Choose infrastructure strategy - Real DB with test containers, in-memory fake, or Docker Compose. Prefer real DBs for schema-heavy services.
Define the testing trophy for your context - Decide the ratio before you write tests. A typical distribution: 60% integration, 30% unit, 10% e2e.
Set up test data factories - Centralize how test objects are created. Factories prevent fragile fixtures and make tests self-documenting.
Wire CI from day one - Tests that only run locally drift. Run unit + integration in every PR, e2e in pre-merge or nightly.

Write effective unit tests - patterns

Unit tests work best for:

Pure functions (same input always gives same output)
Complex conditional logic with many branches
Data transformations and parsing
Domain model invariants

Arrange-Act-Assert structure:

test('applies 10% discount for orders over $100', () => {
  // Arrange
  const order = buildOrder({ subtotal: 120 });

  // Act
  const discounted = applyLoyaltyDiscount(order);

  // Assert
  expect(discounted.total).toBe(108);
});

Parameterize boundary conditions:

test.each([
  [99,  0],   // just below threshold - no discount
  [100, 10],  // exactly at threshold
  [200, 20],  // above threshold
])('order of $%i gets $%i discount', (subtotal, expectedDiscount) => {
  const order = buildOrder({ subtotal });
  expect(applyLoyaltyDiscount(order).discount).toBe(expectedDiscount);
});

See references/test-patterns.md for more patterns.

Write integration tests - database and API

For database integration tests:

// Use real DB, roll back after each test
beforeEach(() => db.beginTransaction());
afterEach(() => db.rollbackTransaction());

test('saves user and returns with id', async () => {
  const user = await userRepo.create({ name: 'Alice', email: 'alice@test.com' });
  expect(user.id).toBeDefined();
  const found = await userRepo.findById(user.id);
  expect(found.name).toBe('Alice');
});

For HTTP API integration tests, test the full request cycle:

test('POST /orders returns 201 with order id', async () => {
  const response = await request(app)
    .post('/orders')
    .send({ items: [{ productId: 'p1', qty: 2 }] });

  expect(response.status).toBe(201);
  expect(response.body.orderId).toBeDefined();
});

Test the unhappy paths equally: 400 for invalid input, 401 for missing auth, 404 for missing resource, 409 for conflicts.

Implement contract testing between services

Contract testing decouples service teams without sacrificing confidence. The consumer defines what it expects; the provider proves it can deliver.

Pact workflow:

Consumer writes a pact test defining the expected request/response shape
Running the consumer test generates a pact file (JSON contract)
Provider runs a pact verification test against that contract
Both upload results to a Pact Broker - can-i-deploy gates deployment

Key rules:

The consumer owns the contract, not the provider
Contracts test shape and semantics, not business logic
Never test every field - only what the consumer actually uses

Measure and improve test quality - not just coverage

Line coverage is a floor, not a ceiling. Use these signals instead:

Mutation score - Run a mutation testing tool (Stryker, PITest). If removing a > 0 check doesn't kill any test, your tests aren't asserting enough.
Test failure rate - Track which tests fail in CI over time. Tests that never fail on a production bug aren't exercising real risk.
Test change frequency - Tests that change every time production code changes are testing implementation, not behavior.
Time to red - How quickly does the suite tell you when something breaks? Optimize for signal speed, not raw pass/fail.

Handle flaky tests systematically

Never re-run a flaky test and call it fixed. Follow this protocol:

Quarantine immediately - Move the flaky test to a separate suite that runs but doesn't block CI. Don't delete it - you'll lose the signal.
Diagnose the root cause - Common causes:
- Shared mutable state between tests (missing cleanup)
- Time-dependent assertions (Date.now(), setTimeout)
- Race conditions in async tests (missing await)
- External service calls that should be stubbed
- Test order dependency
Fix the root cause - If time-dependent: freeze time with a clock fake. If shared state: isolate in beforeEach/afterEach. If async: await properly.
Un-quarantine and monitor - After the fix, restore to main suite and watch for a week of clean runs before declaring victory.

Anti-patterns

Anti-pattern	Problem	What to do instead
Testing the framework	`expect(orm.save).toHaveBeenCalled()` tests that the ORM is wired, not that data was saved	Assert the actual state after the operation
Snapshot testing everything	Snapshot tests fail on any UI change, creating noise and review fatigue	Use snapshots only for serialized output you rarely change (e.g., generated JSON schema)
100% coverage target	Creates tests that execute code without asserting anything meaningful	Set mutation score targets instead; aim for critical-path coverage
Giant test setup	Hundreds of lines of arrange code obscures what's actually being tested	Use builder/factory patterns; set only the fields that matter to the specific test
Mocking what you don't own	Mocking third-party libraries breaks on upgrades and doesn't test actual integration	Write a thin adapter you own, then mock your adapter
Skipping the testing pyramid for greenfield	Starting with e2e tests "because they test everything" leads to slow, brittle suites	Build bottom-up: unit tests first, integration second, e2e last

Gotchas

Testing implementation details breaks on every refactor - Tests that assert internal function calls or private state are coupled to the how, not the what. When you move logic between files or rename functions, these tests fail even though nothing broke. Test through the public API and observable outputs only.
Transaction rollback in DB tests does not catch commit-time failures - Rolling back a transaction after each test is fast but skips any constraints or triggers that only fire on COMMIT (e.g., deferred foreign key checks in PostgreSQL). For critical paths, run at least a subset of tests against a real transaction that commits.
Mutation testing tools report false positives on unreachable branches - Some generated mutants will be in dead code paths that are never exercised. A "survived mutant" in code guarded by a feature flag or error path that is structurally unreachable is not a test gap. Review mutation reports in context.
Quarantined flaky tests accumulate and are never fixed - Moving tests to a flaky folder or CI job without a SLA for fixing them creates a graveyard. Set a 2-week SLA: fix or delete. A quarantine suite that grows signals a systemic problem, not isolated test issues.
100% line coverage does not mean the test suite is valuable - You can achieve 100% coverage with tests that assert nothing meaningful (just call the function). Mutation score is the real quality signal. If your 100% coverage suite has a 40% mutation score, 60% of your tests are noise.

References

For detailed content on specific topics, read the relevant file from references/:

references/test-patterns.md - Common testing patterns: builders, fakes, parameterized tests, and when to use each

Only load a references file if the current task requires deep detail on that topic.

References

test-patterns.md

Test Patterns

A catalog of reusable patterns that make tests easier to write, read, and maintain. Each pattern solves a recurring problem in test design.

Builder Pattern (Object Mother variant)

Problem: Tests need complex objects with 10+ fields; most tests only care about 1-2 fields. Giant constructors or fixture files make tests hard to read.

Pattern: Create a builder function that provides sensible defaults. Override only the fields that matter to the specific test.

// Builder with defaults
function buildUser(overrides: Partial<User> = {}): User {
  return {
    id: 'user-1',
    name: 'Alice',
    email: 'alice@example.com',
    role: 'member',
    isActive: true,
    createdAt: new Date('2024-01-01'),
    ...overrides,
  };
}

// Tests only specify what matters
test('inactive users cannot log in', () => {
  const user = buildUser({ isActive: false });
  expect(canLogin(user)).toBe(false);
});

test('admin users can access settings', () => {
  const user = buildUser({ role: 'admin' });
  expect(canAccessSettings(user)).toBe(true);
});

When to use: Any time test setup requires more than 3-4 fields.

In-Memory Fake

Problem: Mocking a repository method-by-method is tedious and couples tests to implementation. Real DBs are slow to spin up in unit test contexts.

Pattern: Implement a working in-memory version of your repository interface. It behaves like the real thing but stores data in a plain array or map.

interface UserRepository {
  findById(id: string): Promise<User | null>;
  save(user: User): Promise<User>;
  findByEmail(email: string): Promise<User | null>;
}

class InMemoryUserRepository implements UserRepository {
  private users: Map<string, User> = new Map();

  async findById(id: string): Promise<User | null> {
    return this.users.get(id) ?? null;
  }

  async save(user: User): Promise<User> {
    const saved = { ...user, id: user.id ?? randomId() };
    this.users.set(saved.id, saved);
    return saved;
  }

  async findByEmail(email: string): Promise<User | null> {
    return [...this.users.values()].find(u => u.email === email) ?? null;
  }

  // Test helper: seed data without going through save()
  seed(user: User): void {
    this.users.set(user.id, user);
  }
}

// Service test uses the fake - no DB needed
test('cannot register duplicate email', async () => {
  const repo = new InMemoryUserRepository();
  repo.seed(buildUser({ email: 'alice@example.com' }));

  const service = new UserService(repo);
  await expect(service.register({ email: 'alice@example.com', name: 'Bob' }))
    .rejects.toThrow(DuplicateEmailError);
});

When to use: Service-layer tests that need a repository. Prefer over mocking every individual method.

Parameterized Tests

Problem: A function has many edge cases. Writing a separate test for each creates massive test files with mostly duplicate setup code.

Pattern: Use test.each (Jest/Vitest) or equivalent to express a table of inputs and expected outputs.

describe('calculateShippingCost', () => {
  test.each([
    // [orderTotal, membershipTier, expectedCost]
    [10,  'free',    5.99],  // low value, no membership
    [50,  'free',    5.99],  // below free-shipping threshold
    [100, 'free',    0],     // meets free-shipping threshold
    [10,  'silver',  2.99],  // silver discount
    [10,  'gold',    0],     // gold members always free
    [200, 'gold',    0],     // gold members - high value
  ])('order of $%i with %s tier → $%f shipping', (total, tier, expected) => {
    const order = buildOrder({ total, membershipTier: tier });
    expect(calculateShippingCost(order)).toBe(expected);
  });
});

When to use: Pure functions with boundary conditions, parsers, validators, price calculators. Avoid when the setup logic differs significantly between cases.

Test Clock (Time Control)

Problem: Code that calls Date.now(), new Date(), or setTimeout produces different results on different runs. Tests become flaky or must sleep() to wait for timers.

Pattern: Inject a clock abstraction or use your test framework's fake timer.

// Option A: inject a clock (best for unit tests)
interface Clock {
  now(): Date;
}

class SessionManager {
  constructor(private clock: Clock) {}

  isExpired(session: Session): boolean {
    const elapsed = this.clock.now().getTime() - session.createdAt.getTime();
    return elapsed > SESSION_TTL_MS;
  }
}

// Test with a frozen clock
test('session expired after 30 minutes', () => {
  const fixedNow = new Date('2024-06-01T12:30:00Z');
  const clock = { now: () => fixedNow };
  const session = { createdAt: new Date('2024-06-01T12:00:00Z') };
  const manager = new SessionManager(clock);

  expect(manager.isExpired(session)).toBe(true);
});

// Option B: fake timers (best for timer-based code)
test('retries after 5 second delay', async () => {
  vi.useFakeTimers();
  const send = vi.fn().mockRejectedValueOnce(new Error('timeout'));

  const promise = sendWithRetry(send);
  vi.advanceTimersByTime(5000);
  await promise;

  expect(send).toHaveBeenCalledTimes(2);
  vi.useRealTimers();
});

When to use: Any code that branches on the current time, schedules work, or handles TTL/expiry.

Boundary Test Checklist

When testing any function with inputs, cover these categories systematically:

Category	Examples
Happy path	Typical valid input producing expected output
Empty / zero	`""`, `0`, `[]`, `null`, `undefined`
Minimum valid	`1`, one-character string, single-element array
Maximum valid	Max integer, 255-char string, large array
Just below boundary	`limit - 1`
At boundary	Exactly `limit`
Just above boundary	`limit + 1`
Negative	`-1`, `-Infinity`
Type coercion	`"1"` vs `1`, `true` vs `1` (for loosely typed languages)
Special chars	`null bytes`, `\n`, SQL injection strings, Unicode

Not every function needs all categories - pick the ones that are plausible failure modes for the specific function.

Spy on Side Effects

Problem: You need to verify that a side effect was triggered (email sent, event published, audit logged) without building the full infrastructure.

Pattern: Use a spy or a recording fake for side-effect boundaries.

class RecordingEmailSender {
  sentEmails: { to: string; subject: string; body: string }[] = [];

  async send(to: string, subject: string, body: string): Promise<void> {
    this.sentEmails.push({ to, subject, body });
  }

  // Test helper
  sentTo(email: string): boolean {
    return this.sentEmails.some(e => e.to === email);
  }
}

test('sends confirmation email after successful registration', async () => {
  const emailSender = new RecordingEmailSender();
  const service = new RegistrationService(repo, emailSender);

  await service.register({ email: 'alice@example.com', name: 'Alice' });

  expect(emailSender.sentTo('alice@example.com')).toBe(true);
  expect(emailSender.sentEmails[0].subject).toMatch(/welcome/i);
});

When to use: Email senders, event publishers, audit loggers, webhook dispatchers - any side effect that cannot be observed through the main return value.

Contract Test Pattern (Pact)

Problem: Two teams own different services. You need confidence the API contract holds without setting up both services in the same test run.

Pattern: Consumer-driven contract testing. The consumer writes an expected interaction; the pact file becomes the contract the provider must verify.

// Consumer test (e.g., frontend calling user-service)
describe('user-service API contract', () => {
  it('returns user by id', async () => {
    await provider.addInteraction({
      state: 'user 123 exists',
      uponReceiving: 'a request for user 123',
      withRequest: {
        method: 'GET',
        path: '/users/123',
      },
      willRespondWith: {
        status: 200,
        headers: { 'Content-Type': 'application/json' },
        body: {
          id: like('123'),
          name: like('Alice'),
          email: like('alice@example.com'),
        },
      },
    });

    const user = await userServiceClient.getUser('123');
    expect(user.name).toBeDefined();
  });
});

Key rules:

Use like() matchers, not exact values - contracts test shape not data
Only assert on fields the consumer actually uses
Both teams run the pact verification in CI; can-i-deploy blocks mismatches

Test Data Isolation

Problem: Tests that share state (global variables, a shared database) pass individually but fail when run together due to ordering effects.

Patterns:

For databases - transaction rollback:

let db: Database;
beforeAll(async () => { db = await connectTestDb(); });
beforeEach(async () => { await db.query('BEGIN'); });
afterEach(async () => { await db.query('ROLLBACK'); });
afterAll(async () => { await db.disconnect(); });

For shared module state - explicit reset:

// Singleton that accumulates state
let cache: Map<string, unknown>;
beforeEach(() => { cache = new Map(); });

For global HTTP mocks (MSW):

beforeAll(() => server.listen({ onUnhandledRequest: 'error' }));
afterEach(() => server.resetHandlers()); // remove per-test overrides
afterAll(() => server.close());

When to use: Every test suite that touches shared infrastructure. Isolation is not optional - it's the prerequisite for a reliable suite.

Frequently Asked Questions

What is test-strategy?

How do I install test-strategy?

Run npx skills add AbsolutelySkilled/AbsolutelySkilled --skill test-strategy in your terminal. The skill will be immediately available in your AI coding agent.

What AI agents support test-strategy?

test-strategy works with claude-code, gemini-cli, openai-codex. Install it once and use it across any supported AI coding agent.

Is test-strategy free?

Yes, test-strategy is completely free and open source under the MIT license. Install it with a single command and start using it immediately.

What is the difference between test-strategy and similar tools?

test-strategy is an AI agent skill that teaches your coding agent specialized software engineering knowledge. Unlike standalone tools, it integrates directly into claude-code, gemini-cli, openai-codex and other AI agents.

Can I use test-strategy with Cursor or Windsurf?

test-strategy works with any AI coding agent that supports the skills protocol, including Claude Code, Cursor, Windsurf, GitHub Copilot, Gemini CLI, and 40+ more.

test-strategy

What is test-strategy?

Quick Start

test-strategy

Quick Facts

How to Install

Overview

Tags

Platforms

Related Skills

Frequently Asked Questions

What is test-strategy?

How do I install test-strategy?

What AI agents support test-strategy?

Maintainers

SKILL.md

Test Strategy

When to use this skill

Key principles

Core concepts

Test types taxonomy

The Testing Trophy

Test doubles

Coverage metrics

Common tasks

Choose the right test type - decision matrix

Design a test suite for a new service

Write effective unit tests - patterns

Write integration tests - database and API

Implement contract testing between services

Measure and improve test quality - not just coverage

Handle flaky tests systematically

Anti-patterns

Gotchas

References

References

test-patterns.md

Test Patterns

Builder Pattern (Object Mother variant)

In-Memory Fake

Parameterized Tests

Test Clock (Time Control)

Boundary Test Checklist

Spy on Side Effects

Contract Test Pattern (Pact)

Test Data Isolation

Frequently Asked Questions

What is test-strategy?

How do I install test-strategy?

What AI agents support test-strategy?

Is test-strategy free?

What is the difference between test-strategy and similar tools?

Can I use test-strategy with Cursor or Windsurf?