performance-engineering

Use this skill when profiling application performance, debugging memory leaks, optimizing latency, benchmarking code, or reducing resource consumption. Triggers on CPU profiling, memory profiling, flame graphs, garbage collection tuning, load testing, P99 latency, throughput optimization, bundle size reduction, and any task requiring performance analysis or optimization.

What is performance-engineering?

Quick Start

Open your terminal or command prompt
Run: npx skills add AbsolutelySkilled/AbsolutelySkilled --skill performance-engineering
Start your AI coding agent (Claude Code, Cursor, Gemini CLI, or any supported agent)
The performance-engineering skill is now active and ready to use

Overview Files

performance-engineering

performance-engineering is a production-ready AI agent skill for claude-code, gemini-cli, openai-codex. Profiling application performance, debugging memory leaks, optimizing latency, benchmarking code, or reducing resource consumption.

Quick Facts

Field	Value
Category	engineering
Version	0.1.0
Platforms	claude-code, gemini-cli, openai-codex
License	MIT

How to Install

Make sure you have Node.js installed on your machine.
Run the following command in your terminal:

npx skills add AbsolutelySkilled/AbsolutelySkilled --skill performance-engineering

The performance-engineering skill is now available in your AI coding agent (Claude Code, Gemini CLI, OpenAI Codex, etc.).

Overview

A systematic framework for diagnosing, measuring, and improving application performance. This skill covers the full performance lifecycle - from identifying bottlenecks with profilers and flame graphs, to eliminating memory leaks with heap snapshots, to validating improvements with rigorous benchmarks. It applies across the stack: Node.js backend, browser frontend, and database query layer. The guiding philosophy is always measure first, optimize second.

Platforms

claude-code
gemini-cli
openai-codex

Related Skills

Pair performance-engineering with these complementary skills:

Frequently Asked Questions

What is performance-engineering?

How do I install performance-engineering?

Run npx skills add AbsolutelySkilled/AbsolutelySkilled --skill performance-engineering in your terminal. The skill will be immediately available in your AI coding agent.

What AI agents support performance-engineering?

This skill works with claude-code, gemini-cli, openai-codex. Install it once and use it across any supported AI coding agent.

Maintainers

@maddhruv

Generated from AbsolutelySkilled

SKILL.md

Performance Engineering

When to use this skill

Trigger this skill when the user:

Observes high P95/P99 latency or slow response times in production
Reports memory growing unboundedly or OOM crashes
Wants to profile CPU usage or generate a flame graph
Needs to benchmark two implementations to decide between them
Is investigating event loop blocking or long tasks in the browser
Wants to reduce JavaScript bundle size, TTI, or Core Web Vitals scores
Is tuning garbage collection, heap limits, or worker thread pools
Needs to set up continuous performance monitoring or performance budgets
Is debugging N+1 queries, slow database queries, or connection pool exhaustion

Do NOT trigger this skill for:

General code quality refactors with no performance goal (use clean-code skill)
Capacity planning and infrastructure scaling decisions (use backend-engineering skill)

Key principles

Measure first, always - Never optimize based on intuition. Instrument the code, collect data, and let profiler output tell you where time actually goes. Assumptions about bottlenecks are wrong more often than not.
Optimize the bottleneck, not the code - Amdahl's Law: speeding up a component that is 5% of total runtime yields at most 5% improvement. Find the dominant cost, fix that, then re-measure to find the new dominant cost. Repeat.
Set performance budgets upfront - Define what "fast enough" means before writing a line. A target of "P99 < 200ms" or "bundle < 150KB" creates a measurable pass/fail criterion. Without a budget, optimization is endless.
Test under realistic load - A function that takes 1ms with 10 users may take 800ms with 1000 concurrent users due to lock contention, cache pressure, or connection pool exhaustion. Always load-test against production-like data volumes.
Premature optimization is the root of all evil - (Knuth) Write correct, readable code first. Profile in a realistic environment. Only then optimize the measured hot path. Code that sacrifices clarity for unmeasured performance gains is technical debt.

Core concepts

Latency vs throughput - Latency is how long one request takes. Throughput is how many requests complete per second. Optimizing one does not automatically improve the other. A batching strategy can dramatically increase throughput while increasing individual request latency.

Percentiles (P50/P95/P99) - Averages hide outliers. P99 latency is the experience of 1 in 100 users. In high-traffic systems, the P99 user matters. Never report only averages - always report P50, P95, and P99 together.

Flame graphs - A visualization of sampled call stacks where width represents time spent. Wide bars at the top of a flame are hot functions to optimize. Generated by 0x, clinic flame, or Chrome DevTools CPU profiler.

Heap snapshots - A point-in-time dump of all live objects in the JS heap. Compare two snapshots (before/after a suspected leak window) to find objects accumulating without being GC'd. Available in Chrome DevTools and Node.js v8.writeHeapSnapshot().

Profiler types - Sampling profilers (low overhead, statistical) vs instrumentation profilers (exact counts, higher overhead). Use sampling for production diagnosis, instrumentation for precise benchmark attribution.

Amdahl's Law - Max speedup = 1 / (1 - P + P/N) where P is the parallelizable fraction and N is the number of processors. A program that is 90% parallelizable has a theoretical max speedup of 10x regardless of how many cores you add.

Common tasks

Profile CPU usage

Use Node.js built-in profiler or 0x for flame graphs:

# Built-in V8 profiler - generates isolate-*.log
node --prof server.js
# Run your load, then process the log
node --prof-process isolate-*.log > profile.txt

# 0x - generates interactive flame graph HTML
npx 0x -- node server.js
# Then apply load; 0x auto-generates flamegraph.html

In TypeScript, mark hot sections explicitly for DevTools profiling:

// Wrap suspected hot paths to isolate them in profiles
function processItems(items: Item[]): Result[] {
  console.time('processItems');
  const result = items.map(transform);
  console.timeEnd('processItems');
  return result;
}

For browser CPU profiling, open Chrome DevTools > Performance tab > Record while reproducing the slow interaction. Look for long tasks (>50ms) in the flame chart.

Debug memory leaks

Capture two heap snapshots - one before and one after a suspected leak window - then compare retained objects:

import { writeHeapSnapshot } from 'v8';
import { setInterval } from 'timers';

// Snapshot 1: baseline
writeHeapSnapshot(); // writes Heap-<pid>-<seq>.heapsnapshot

// Simulate load / time passing
await runWorkload();

// Snapshot 2: after suspected leak
writeHeapSnapshot();
// Load both files in Chrome DevTools > Memory > Compare snapshots

Avoid closure-based leaks by using WeakRef and FinalizationRegistry for optional references that should not prevent GC:

class Cache {
  private store = new Map<string, WeakRef<object>>();
  private registry = new FinalizationRegistry((key: string) => {
    this.store.delete(key); // auto-cleanup when value is GC'd
  });

  set(key: string, value: object): void {
    this.store.set(key, new WeakRef(value));
    this.registry.register(value, key);
  }

  get(key: string): object | undefined {
    return this.store.get(key)?.deref();
  }
}

Common leak sources: event listeners never removed, global maps/sets that grow forever, closures capturing large objects, and timers/intervals not cleared.

Benchmark code

Proper microbenchmarking requires warmup to let V8 JIT compile, multiple iterations to reduce noise, and statistical comparison:

import Benchmark from 'benchmark';

const suite = new Benchmark.Suite();

suite
  .add('Array.from', () => {
    Array.from({ length: 1000 }, (_, i) => i * 2);
  })
  .add('for loop', () => {
    const arr: number[] = new Array(1000);
    for (let i = 0; i < 1000; i++) arr[i] = i * 2;
  })
  .on('cycle', (event: Benchmark.Event) => {
    console.log(String(event.target));
  })
  .on('complete', function (this: Benchmark.Suite) {
    console.log('Fastest: ' + this.filter('fastest').map('name'));
  })
  .run({ async: true });

Rules for valid microbenchmarks:

Warmup at least 3 iterations before measuring
Run for at least 1 second per case to smooth JIT variance
Prevent dead-code elimination - consume the result
Test with realistic input size and shape

Optimize Node.js event loop

Detect blocking with clinic bubbleprof or manual measurement:

import { performance, PerformanceObserver } from 'perf_hooks';

// Detect event loop lag
let lastCheck = Date.now();
setInterval(() => {
  const lag = Date.now() - lastCheck - 100; // expected 100ms
  if (lag > 50) console.warn(`Event loop lag: ${lag}ms`);
  lastCheck = Date.now();
}, 100).unref();

Move CPU-intensive work off the main thread with worker threads:

import { Worker, isMainThread, parentPort, workerData } from 'worker_threads';

// main-thread side
function runCPUTask(data: unknown): Promise<unknown> {
  return new Promise((resolve, reject) => {
    const worker = new Worker(__filename, { workerData: data });
    worker.on('message', resolve);
    worker.on('error', reject);
  });
}

// worker side
if (!isMainThread) {
  const result = heavyComputation(workerData);
  parentPort?.postMessage(result);
}

Reduce frontend bundle size

Audit bundle composition first, then fix the biggest wins:

# Visualize what's in your bundle
npx webpack-bundle-analyzer stats.json
# or for Vite:
npx vite-bundle-visualizer

Apply tree shaking with named imports:

// Bad - imports entire lodash (~70KB)
import _ from 'lodash';
const result = _.debounce(fn, 300);

// Good - imports only debounce (~2KB)
import debounce from 'lodash/debounce';
const result = debounce(fn, 300);

Use dynamic imports for code splitting at route boundaries:

// React lazy loading - splits route into separate chunk
import { lazy, Suspense } from 'react';

const Dashboard = lazy(() => import('./pages/Dashboard'));

function App() {
  return (
    <Suspense fallback={<Spinner />}>
      <Dashboard />
    </Suspense>
  );
}

Set up performance monitoring

Track Core Web Vitals with the web-vitals library:

import { onCLS, onINP, onLCP, onFCP, onTTFB } from 'web-vitals';

function sendToAnalytics(metric: { name: string; value: number; rating: string }) {
  navigator.sendBeacon('/analytics', JSON.stringify(metric));
}

onCLS(sendToAnalytics);   // Cumulative Layout Shift - target < 0.1
onINP(sendToAnalytics);   // Interaction to Next Paint - target < 200ms
onLCP(sendToAnalytics);   // Largest Contentful Paint - target < 2.5s
onFCP(sendToAnalytics);
onTTFB(sendToAnalytics);

Add custom server-side timing for API endpoints:

import { performance } from 'perf_hooks';

function withTiming<T>(name: string, fn: () => Promise<T>): Promise<T> {
  const start = performance.now();
  return fn().finally(() => {
    const duration = performance.now() - start;
    metrics.histogram(name, duration); // send to Datadog/Prometheus
  });
}

// Usage
const user = await withTiming('db.getUser', () => db.users.findById(id));

Optimize database query performance

Fix N+1 queries by batching with DataLoader:

import DataLoader from 'dataloader';

// Without DataLoader: 1 query per user = N+1
// With DataLoader: batches into 1 query per tick
const userLoader = new DataLoader(async (ids: readonly string[]) => {
  const users = await db.users.findMany({ where: { id: { in: [...ids] } } });
  const map = new Map(users.map((u) => [u.id, u]));
  return ids.map((id) => map.get(id) ?? null);
});

// Each call is automatically batched
const user = await userLoader.load(userId);

Use connection pooling and avoid pool exhaustion:

import { Pool } from 'pg';

const pool = new Pool({
  max: 20,           // max connections - tune to (2 * CPU cores + 1) as starting point
  idleTimeoutMillis: 30_000,
  connectionTimeoutMillis: 2_000,
});

// Always release connections - use try/finally
const client = await pool.connect();
try {
  const result = await client.query('SELECT ...', [params]);
  return result.rows;
} finally {
  client.release(); // critical - never omit
}

Anti-patterns / common mistakes

Mistake	Why it's wrong	What to do instead
Optimizing without profiling	Fixes the wrong thing; wastes time; may degrade perf elsewhere	Profile first, let data identify the bottleneck
Benchmarking without warmup	V8 JIT hasn't compiled the hot path; results are misleading	Run 3+ warmup iterations before measuring
Using averages instead of percentiles	Hides tail latency that real users experience	Report P50, P95, P99 together
Caching everything eagerly	Stale data, unbounded memory growth, invalidation nightmares	Cache only measured hot reads; define TTL and invalidation upfront
Blocking the event loop with sync I/O	Freezes all concurrent requests for the duration	Use async fs/net APIs; move CPU work to worker threads
Measuring in development, deploying to production	V8 opts, GC pressure, and concurrency behave differently in prod	Profile under production-like load with production build

Gotchas

Microbenchmarks without preventing dead-code elimination produce meaningless results - V8 will optimize away computations whose results are never used. A benchmark that calls computeResult() without consuming the return value may be measuring near-zero work. Always store the result in a variable and use it (e.g., sum += result) so the compiler cannot eliminate the hot path.
Connection pool exhaustion masquerades as slow queries - If all DB connections are in use, new queries queue behind them and appear in traces as 500ms+ "database time" when the query itself takes 5ms. Check pool.totalCount, pool.idleCount, and pool.waitingCount before optimizing queries. Pool exhaustion often looks like slow DB, not like a pool problem.
Profiling in development produces unrepresentative results - V8 optimizes differently in development (no minification, source maps active, NODE_ENV=development guards enabled). Profiling a dev build and optimizing based on that output can be entirely misleading. Always profile against a production build with production environment variables and realistic data volume.
Heap snapshots taken during GC produce inflated retained sizes - If you trigger a heap snapshot during a GC cycle, the snapshot may show objects that are already queued for collection but not yet freed. Compare two snapshots taken at the same phase of your workload (e.g., both after processing 100 requests) to get valid comparisons.
Worker threads do not share memory by default - serialization overhead can exceed compute savings - Offloading a task to a worker thread requires serializing input data (via postMessage) and deserializing results back. For tasks involving large objects, this serialization cost can exceed the compute benefit. Use SharedArrayBuffer for large data payloads that need to cross the worker boundary frequently.

References

Load the relevant reference file only when the current task requires it:

references/profiling-tools.md - Node.js profiler, Chrome DevTools, Lighthouse, clinic.js, 0x, and how to choose between them

References

profiling-tools.md

Profiling Tools Reference

This reference covers the primary profiling tools used in Node.js and browser performance analysis. Each section describes what the tool does, when to use it, its overhead profile, and concrete usage instructions.

Tool selection guide

Situation	Best tool
First look at a production Node.js service	`clinic doctor`
Deep CPU flame graph for Node.js	`0x` or `clinic flame`
Async I/O and event loop bottlenecks	`clinic bubbleprof`
Browser page load performance (LCP, TTI)	Chrome DevTools Performance or Lighthouse
Automated Lighthouse CI in a pipeline	`lighthouse` CLI
Quick heap leak investigation	Chrome DevTools Memory tab
Programmatic Node.js heap snapshot	`v8.writeHeapSnapshot()`

Node.js built-in `--prof`

The V8 sampling profiler built into Node.js. Zero external dependencies, available everywhere Node runs.

Overhead: Low (1-5% in most cases). Safe to use in staging environments.

Usage:

# 1. Start your process with profiling enabled
node --prof app.js

# 2. Apply load (with a load tester, real traffic, or a script)
# V8 writes isolate-<pid>-<seq>-v8.log during this time

# 3. Convert the log to human-readable output
node --prof-process isolate-*.log > profile.txt

# 4. Look for [Bottom up (heavy) profile] section
# Functions with high self-time % are the hot paths

Interpreting output:

The --prof-process output has two sections:

[Top down] - call tree from root, shows inclusive time
[Bottom up (heavy) profile] - sorted by self-time, shows where CPU actually spends time

Focus on the bottom-up section first. A function with 30%+ self-time is a primary optimization candidate.

Limitations: Text-based output is hard to navigate for deep call stacks. Use 0x or Chrome DevTools for visual flame graphs.

0x - Flame Graph Generator

0x wraps --prof and converts the V8 log directly into an interactive HTML flame graph. It is the fastest way to get a visual CPU profile for a Node.js process.

Overhead: Same as --prof (1-5%). Not recommended for production without traffic isolation.

Installation:

npm install -g 0x
# or use npx without installing

Usage:

# Wrap any node command
npx 0x -- node server.js
npx 0x -- node -r ts-node/register server.ts

# Apply your load while the process runs, then send SIGINT (Ctrl+C)
# 0x writes: flamegraph.html in the current directory

Reading a flame graph:

X-axis: proportion of time (wider = more time)
Y-axis: call stack depth (callee on top of caller)
Hot paths: wide bars near the top of a call chain
Click any frame to zoom in; press Escape to zoom out
Filter by file name in the search box to focus on application code vs. Node internals

Tip: Use the "invert" toggle to flip the graph - this puts hot self-time functions at the top, similar to --prof-process bottom-up output.

clinic.js

A suite of three diagnostic tools from NearForm, each targeting a different class of Node.js performance problem.

Installation:

npm install -g clinic

`clinic doctor`

General-purpose diagnostic. Runs your process, collects metrics, and produces an HTML report with recommended next steps. Start here when you don't know where the problem is.

clinic doctor -- node server.js
# Apply load, then Ctrl+C
# Opens doctor-report/ in browser

The report shows CPU, memory, event loop delay, and active handles over time. It flags symptoms like "event loop blocked" or "memory growing without GC collection" with explanations and links to the relevant clinic sub-tool.

`clinic flame`

Generates a flame graph optimized for Node.js, similar to 0x but with additional filtering for async frames and built-in Node internals suppression.

clinic flame -- node server.js

Prefer clinic flame over 0x when you have async/await-heavy code and want async frames visible in the flame graph.

`clinic bubbleprof`

Visualizes asynchronous delays and I/O wait time as a bubble diagram. Best for understanding where time goes between async operations (waiting on DB, network, timers).

clinic bubbleprof -- node server.js

Use bubbleprof when the CPU profiler shows low CPU utilization but latency is still high - the bottleneck is I/O wait or async queuing, not CPU.

Chrome DevTools - Performance Tab

The browser's built-in CPU and rendering profiler. Works for both frontend JavaScript and (via --inspect) Node.js processes.

Overhead: Recording adds 5-15% overhead. Always test in a separate browser profile without extensions.

Browser profiling

Open DevTools > Performance tab
Click the record button (circle icon)
Reproduce the slow interaction
Click stop
Analyze the flame chart

Key sections in the Performance timeline:

Network - resource loading waterfall
Main - JavaScript execution flame chart on the main thread
Compositor / Raster - paint and layout work
Long Tasks - red diagonal stripes indicate tasks > 50ms (block user input)

Identifying long tasks: Any red-striped block on the Main thread is a long task. Click it to see the full call stack. Long tasks over 50ms delay user interaction responses (INP).

Node.js remote profiling via `--inspect`

# Start Node with inspector
node --inspect server.js
# or break on start:
node --inspect-brk server.js

Then open chrome://inspect in Chrome, click "inspect" next to your process. This opens a full DevTools instance connected to the Node process. Use the Performance tab identically to browser profiling.

Chrome DevTools - Memory Tab

Three modes for investigating heap usage:

Heap snapshot

Takes a full snapshot of all live objects. Compare two snapshots to find leaks:

DevTools > Memory > Heap snapshot > Take snapshot (baseline)
Trigger the suspected leak operation N times
Take a second snapshot
Select "Comparison" view from the dropdown
Sort by "# Delta" (new objects) or "Size Delta" (retained bytes)

Objects with positive size delta that are unexpected are your leak candidates. Click any object to see its retainer path - the chain of references keeping it alive.

Allocation instrumentation

Records all allocations over time with call stacks. More expensive than snapshots but shows exactly which code paths are allocating memory.

DevTools > Memory > Allocation instrumentation on timeline > Start

Allocation sampling

Low-overhead sampling of allocations. Good for long-running production-like sessions where instrumentation is too expensive.

Lighthouse

Google's automated auditing tool for web performance, accessibility, SEO, and best practices. Measures Core Web Vitals and provides scored recommendations.

CLI usage:

npm install -g lighthouse

# Run against a URL, output to HTML report
lighthouse https://example.com --output html --output-path ./report.html

# Run in CI mode (JSON output, no browser window)
lighthouse https://example.com --output json --chrome-flags="--headless"

Key performance metrics Lighthouse measures:

FCP (First Contentful Paint) - target < 1.8s
LCP (Largest Contentful Paint) - target < 2.5s
TBT (Total Blocking Time) - proxy for INP; target < 200ms
CLS (Cumulative Layout Shift) - target < 0.1
Speed Index - how quickly content is visually populated

Lighthouse CI for continuous monitoring:

npm install -g @lhci/cli

# Add to CI pipeline
lhci autorun --collect.url=https://staging.example.com \
             --assert.preset=lighthouse:recommended

Set budgets.json to fail CI when performance regresses:

[
  {
    "path": "/*",
    "timings": [
      { "metric": "largest-contentful-paint", "budget": 2500 },
      { "metric": "total-blocking-time", "budget": 200 }
    ],
    "resourceSizes": [
      { "resourceType": "script", "budget": 150 }
    ]
  }
]

`v8.writeHeapSnapshot()` - Programmatic Snapshots

For automated leak detection in long-running processes without manual DevTools interaction:

import { writeHeapSnapshot } from 'v8';
import { setInterval } from 'timers';

// Take a snapshot every 30 minutes to track growth
setInterval(() => {
  const filename = writeHeapSnapshot();
  console.log(`Heap snapshot written: ${filename}`);
}, 30 * 60 * 1000).unref();

// Or trigger via HTTP endpoint for on-demand snapshots
app.post('/debug/heap-snapshot', (_req, res) => {
  const filename = writeHeapSnapshot();
  res.json({ filename });
});

The generated .heapsnapshot file can be loaded in Chrome DevTools > Memory > Load.

Security note: Never expose a heap snapshot endpoint publicly - snapshots contain all in-memory data including secrets, tokens, and user data.

Choosing between tools - decision flowchart

Is the problem CPU (high CPU %) or I/O wait (low CPU, high latency)?
  CPU -> Use 0x or clinic flame for a flame graph
  I/O -> Use clinic bubbleprof to visualize async delays

Is it a Node.js backend or browser frontend?
  Node.js -> 0x / clinic / --prof / --inspect + Chrome DevTools
  Browser -> Chrome DevTools Performance + Lighthouse

Do you need automated CI performance regression detection?
  YES -> Lighthouse CI with performance budgets
  NO  -> Manual DevTools or 0x run

Is memory growing unboundedly?
  YES -> Chrome DevTools Memory > Heap snapshot comparison
        or v8.writeHeapSnapshot() + programmatic diff

Frequently Asked Questions

What is performance-engineering?

How do I install performance-engineering?

Run npx skills add AbsolutelySkilled/AbsolutelySkilled --skill performance-engineering in your terminal. The skill will be immediately available in your AI coding agent.

What AI agents support performance-engineering?

performance-engineering works with claude-code, gemini-cli, openai-codex. Install it once and use it across any supported AI coding agent.

Is performance-engineering free?

Yes, performance-engineering is completely free and open source under the MIT license. Install it with a single command and start using it immediately.

What is the difference between performance-engineering and similar tools?

performance-engineering is an AI agent skill that teaches your coding agent specialized software engineering knowledge. Unlike standalone tools, it integrates directly into claude-code, gemini-cli, openai-codex and other AI agents.

Can I use performance-engineering with Cursor or Windsurf?

performance-engineering works with any AI coding agent that supports the skills protocol, including Claude Code, Cursor, Windsurf, GitHub Copilot, Gemini CLI, and 40+ more.

performance-engineering

What is performance-engineering?

Quick Start

performance-engineering

Quick Facts

How to Install

Overview

Tags

Platforms

Related Skills

Frequently Asked Questions

What is performance-engineering?

How do I install performance-engineering?

What AI agents support performance-engineering?

Maintainers

SKILL.md

Performance Engineering

When to use this skill

Key principles

Core concepts

Common tasks

Profile CPU usage

Debug memory leaks

Benchmark code

Optimize Node.js event loop

Reduce frontend bundle size

Set up performance monitoring

Optimize database query performance

Anti-patterns / common mistakes

Gotchas

References

References

profiling-tools.md

Profiling Tools Reference

Tool selection guide

Node.js built-in --prof

0x - Flame Graph Generator

clinic.js

clinic doctor

clinic flame

clinic bubbleprof

Chrome DevTools - Performance Tab

Browser profiling

Node.js remote profiling via --inspect

Chrome DevTools - Memory Tab

Heap snapshot

Allocation instrumentation

Allocation sampling

Lighthouse

v8.writeHeapSnapshot() - Programmatic Snapshots

Choosing between tools - decision flowchart

Frequently Asked Questions

What is performance-engineering?

How do I install performance-engineering?

What AI agents support performance-engineering?

Is performance-engineering free?

What is the difference between performance-engineering and similar tools?

Can I use performance-engineering with Cursor or Windsurf?

Node.js built-in `--prof`

`clinic doctor`

`clinic flame`

`clinic bubbleprof`

Node.js remote profiling via `--inspect`

`v8.writeHeapSnapshot()` - Programmatic Snapshots