database-engineering

Use this skill when designing database schemas, optimizing queries, creating indexes, planning migrations, or choosing between database technologies. Triggers on schema design, normalization, indexing strategies, query optimization, EXPLAIN plans, migrations, partitioning, replication, connection pooling, and any task requiring database architecture or performance decisions.

What is database-engineering?

Quick Start

Open your terminal or command prompt
Run: npx skills add AbsolutelySkilled/AbsolutelySkilled --skill database-engineering
Start your AI coding agent (Claude Code, Cursor, Gemini CLI, or any supported agent)
The database-engineering skill is now active and ready to use

Overview Files

database-engineering

database-engineering is a production-ready AI agent skill for claude-code, gemini-cli, openai-codex. Designing database schemas, optimizing queries, creating indexes, planning migrations, or choosing between database technologies.

Quick Facts

Field	Value
Category	engineering
Version	0.1.0
Platforms	claude-code, gemini-cli, openai-codex
License	MIT

How to Install

Make sure you have Node.js installed on your machine.
Run the following command in your terminal:

npx skills add AbsolutelySkilled/AbsolutelySkilled --skill database-engineering

The database-engineering skill is now available in your AI coding agent (Claude Code, Gemini CLI, OpenAI Codex, etc.).

Overview

A disciplined framework for designing, optimizing, and evolving relational databases in production. This skill covers schema design, indexing strategies, query optimization, safe migrations, and operational concerns like connection pooling and partitioning. It is opinionated about PostgreSQL but most principles apply to any SQL database. The goal is to help you make the right trade-off at each decision point, not just hand you a syntax reference.

Platforms

claude-code
gemini-cli
openai-codex

Related Skills

Pair database-engineering with these complementary skills:

Frequently Asked Questions

What is database-engineering?

How do I install database-engineering?

Run npx skills add AbsolutelySkilled/AbsolutelySkilled --skill database-engineering in your terminal. The skill will be immediately available in your AI coding agent.

What AI agents support database-engineering?

This skill works with claude-code, gemini-cli, openai-codex. Install it once and use it across any supported AI coding agent.

Maintainers

@maddhruv

Generated from AbsolutelySkilled

SKILL.md

Database Engineering

When to use this skill

Trigger this skill when the user:

Designs a database schema or needs normalization guidance
Asks about creating or tuning indexes (composite, partial, covering)
Wants to understand or optimize a slow query or EXPLAIN plan
Plans a database migration (adding columns, renaming, dropping, backfilling)
Implements soft deletes, audit trails, or temporal data patterns
Sets up connection pooling (PgBouncer, application-level pools)
Partitions a large table by time, hash, or range
Chooses between replication strategies (read replicas, logical replication)
Investigates deadlocks, connection exhaustion, or lock contention

Do NOT trigger this skill for:

NoSQL / document store design (MongoDB, DynamoDB) - different trade-off space
ORM-specific configuration questions unrelated to the underlying SQL

Key principles

Normalize first, then denormalize with a documented reason - Start in third normal form. Every denormalization must be a conscious decision backed by a measured performance requirement, not a guess. Write a comment explaining why.
Index for your queries, not your tables - An index that does not serve a query is write overhead and bloat. Before adding an index, write out the query it serves and confirm with EXPLAIN ANALYZE that it is actually used.
Migrations must be reversible - Every schema change should have a rollback path. Use the expand-contract pattern for breaking changes: add the new shape, migrate data, deprecate the old shape, then drop it in a later release.
Measure before optimizing - EXPLAIN ANALYZE is the ground truth. Never tune a query without first reading the plan. A query that looks slow may be fast; a query that looks fast may be causing invisible downstream load.
Plan for growth at schema design time - Ask: "What happens at 100x rows? At 10x write throughput?" Identify which columns will need indexes, which tables might need partitioning, and which joins will become expensive before the schema is locked.

Core concepts

Normalization forms

Form	What it eliminates	When to stop here
1NF	Repeating groups, non-atomic columns	Almost never - baseline only
2NF	Partial dependencies on composite keys	Rare - get to 3NF
3NF	Transitive dependencies	Default target for OLTP schemas
BCNF	Remaining anomalies in 3NF edge cases	When you have overlapping candidate keys

Denormalize (with intent) for read-heavy aggregations, pre-computed summaries, or when JOINs across normalized tables are measured to be a bottleneck.

Index types

Type	Structure	Best for
B-tree	Balanced tree	Equality, range, ORDER BY, IS NULL - the default
Hash	Hash table	Equality-only lookups (rarely faster than B-tree in Postgres)
GIN	Inverted index	JSONB keys, full-text search, array containment
GiST	Generalized search tree	Geometric data, range types, nearest-neighbor
BRIN	Block range index	Very large append-only tables sorted by a natural order (e.g. timestamps)

Composite B-tree indexes follow the leftmost prefix rule: an index on (a, b, c) serves queries filtering on a, (a, b), or (a, b, c) - but not (b, c) alone.

ACID and WAL

ACID (Atomicity, Consistency, Isolation, Durability) guarantees that transactions are all-or-nothing, maintain invariants, are isolated from each other, and survive crashes. PostgreSQL implements these via MVCC (Multi-Version Concurrency Control) - readers never block writers and vice versa.

WAL (Write-Ahead Log) is the mechanism for durability and replication. Every change is written to the WAL before it hits the data file. Streaming replication ships WAL segments to replicas. Logical replication decodes WAL into row-level change events.

Connection pooling

Each PostgreSQL connection is a forked OS process (~5-10 MB RAM). At 500 direct connections, the database is spending more time on connection overhead than queries. PgBouncer in transaction mode is the standard solution - it multiplexes many application connections onto a small pool of server connections. Target 10-20 server connections per core as a starting point.

Read replicas

Streaming replicas receive WAL in near-real-time (seconds of lag typical, configurable). Use them to offload analytics, reporting, and read-heavy background jobs. Replication lag means replicas can return stale data - never send reads that require post-write consistency to a replica.

Common tasks

Design a normalized schema

Start from an e-commerce domain. Identify entities, attributes, and relationships before writing DDL.

-- 1. Core entities in 3NF
CREATE TABLE customers (
  id          BIGSERIAL PRIMARY KEY,
  email       TEXT        NOT NULL UNIQUE,
  created_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE TABLE products (
  id          BIGSERIAL PRIMARY KEY,
  sku         TEXT        NOT NULL UNIQUE,
  name        TEXT        NOT NULL,
  price_cents INT         NOT NULL CHECK (price_cents >= 0)
);

-- 2. Orders reference customers - foreign key with index
CREATE TABLE orders (
  id          BIGSERIAL PRIMARY KEY,
  customer_id BIGINT      NOT NULL REFERENCES customers(id),
  status      TEXT        NOT NULL DEFAULT 'pending'
                          CHECK (status IN ('pending','confirmed','shipped','cancelled')),
  created_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX idx_orders_customer_id ON orders(customer_id);
CREATE INDEX idx_orders_status_created ON orders(status, created_at DESC);

-- 3. Junction table for order line items
CREATE TABLE order_items (
  id          BIGSERIAL PRIMARY KEY,
  order_id    BIGINT      NOT NULL REFERENCES orders(id),
  product_id  BIGINT      NOT NULL REFERENCES products(id),
  quantity    INT         NOT NULL CHECK (quantity > 0),
  unit_price_cents INT    NOT NULL
);

CREATE INDEX idx_order_items_order_id ON order_items(order_id);

unit_price_cents is intentionally denormalized from products.price_cents. Prices change over time; the order must record what the customer was charged.

Create effective indexes

-- Composite index: filter first on equality columns, then range/sort
-- Serves: WHERE org_id = ? AND status = ? ORDER BY created_at DESC
CREATE INDEX idx_orders_org_status_created
  ON orders(org_id, status, created_at DESC);

-- Partial index: only index the rows you actually query
-- Saves space and stays small even as the table grows
CREATE INDEX idx_orders_pending
  ON orders(customer_id, created_at)
  WHERE status = 'pending';

-- Covering index: include non-filter columns to avoid heap fetch
-- The query can be answered entirely from the index (index-only scan)
CREATE INDEX idx_products_sku_covering
  ON products(sku)
  INCLUDE (name, price_cents);

-- Check index usage - drop indexes with low scans
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;

Read and optimize EXPLAIN plans

-- Always use EXPLAIN ANALYZE (BUFFERS) for real execution data
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, c.email, sum(oi.quantity * oi.unit_price_cents)
FROM orders o
JOIN customers c ON c.id = o.customer_id
JOIN order_items oi ON oi.order_id = o.id
WHERE o.status = 'pending'
GROUP BY o.id, c.email;

Key things to read in the plan output:

Signal	What it means	Action
`Seq Scan` on a large table	No usable index	Add an index on the filter column
`rows=10000` vs actual `rows=3`	Bad statistics	Run `ANALYZE tablename`
`Hash Join` with large `Batches`	Spilling to disk	Increase `work_mem` or add index
`Nested Loop` with large outer set	N+1 at the SQL level	Rewrite as hash join or batch
High `Buffers: shared hit`	Data in cache - good	No action needed
High `Buffers: shared read`	Data read from disk	Consider more cache or BRIN index

Write safe migrations

Use the expand-contract pattern for zero-downtime changes:

-- Phase 1 (expand): add nullable column, old code ignores it
ALTER TABLE orders ADD COLUMN notes TEXT;

-- Phase 2 (backfill): run in batches to avoid locking
DO $$
DECLARE batch_size INT := 1000;
        last_id    BIGINT := 0;
BEGIN
  LOOP
    UPDATE orders
    SET notes = ''
    WHERE id > last_id AND id <= last_id + batch_size AND notes IS NULL;

    GET DIAGNOSTICS last_id = ROW_COUNT;
    EXIT WHEN last_id = 0;
    PERFORM pg_sleep(0.05); -- yield to avoid lock contention
    last_id := last_id + batch_size;
  END LOOP;
END $$;

-- Phase 3 (contract): add NOT NULL constraint after all rows are filled
ALTER TABLE orders ALTER COLUMN notes SET NOT NULL;
ALTER TABLE orders ALTER COLUMN notes SET DEFAULT '';

Never ALTER TABLE ... ADD COLUMN ... NOT NULL without a DEFAULT on Postgres < 11. On Postgres 11+ it is safe only if the default is a constant. On older versions it rewrites the entire table and takes an exclusive lock.

Implement soft deletes vs hard deletes

-- Soft delete pattern
ALTER TABLE customers ADD COLUMN deleted_at TIMESTAMPTZ;

-- Partial index keeps active-record queries fast
CREATE INDEX idx_customers_active ON customers(email) WHERE deleted_at IS NULL;

-- Application queries always filter
SELECT * FROM customers WHERE deleted_at IS NULL AND email = $1;

-- Hard delete with archival (for GDPR / data retention)
WITH deleted AS (
  DELETE FROM customers WHERE id = $1 RETURNING *
)
INSERT INTO customers_archive SELECT *, now() AS archived_at FROM deleted;

Prefer hard deletes with an archive table for compliance-sensitive data. Use soft deletes only when you need "undo" semantics or audit trails.

Set up connection pooling

# pgbouncer.ini - transaction mode is best for most web workloads
[databases]
myapp = host=127.0.0.1 port=5432 dbname=myapp

[pgbouncer]
pool_mode          = transaction
max_client_conn    = 1000   ; application connections in
default_pool_size  = 25     ; server connections per database
min_pool_size      = 5
reserve_pool_size  = 5
server_lifetime    = 3600
server_idle_timeout = 600
log_connections    = 0      ; disable in high-throughput environments

In transaction mode, prepared statements and SET commands do not persist across connections. Use DEALLOCATE ALL or disable prepared statements in your driver (prepared_statement_cache_size=0 in JDBC).

Partition large tables

-- Range partition by month (good for time-series, logs, events)
CREATE TABLE events (
  id         BIGSERIAL,
  created_at TIMESTAMPTZ NOT NULL,
  type       TEXT        NOT NULL,
  payload    JSONB
) PARTITION BY RANGE (created_at);

CREATE TABLE events_2024_01
  PARTITION OF events FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE events_2024_02
  PARTITION OF events FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');

-- Automate with pg_partman extension
SELECT partman.create_parent(
  p_parent_table => 'public.events',
  p_control      => 'created_at',
  p_type         => 'native',
  p_interval     => 'monthly'
);

-- Partition pruning - Postgres skips partitions outside the WHERE range
EXPLAIN SELECT * FROM events WHERE created_at >= '2024-01-15';
-- Should show: Append -> Seq Scan on events_2024_01 (only one child scanned)

Error handling

Error	Root cause	Resolution
`deadlock detected`	Two transactions acquiring the same locks in opposite order	Enforce a consistent lock acquisition order; use `SELECT ... FOR UPDATE SKIP LOCKED` for queue patterns
`too many connections`	App creating connections faster than they close	Add PgBouncer; audit connection pool settings; check for connection leaks
`canceling statement due to conflict with recovery`	Long query on replica conflicts with WAL replay	Increase `max_standby_streaming_delay`; move analytics to a dedicated replica
`could not serialize access due to concurrent update`	SERIALIZABLE isolation write conflict	Retry the transaction; this is expected behavior, not a bug
`index bloat` / slow index scans	Dead tuples not vacuumed, bloated index pages	Run `VACUUM ANALYZE`; tune `autovacuum_vacuum_scale_factor` for high-churn tables
Query slow after data growth	Missing index or stale planner statistics	Run `ANALYZE tablename`; check with `EXPLAIN (ANALYZE, BUFFERS)`

Gotchas

ALTER TABLE ... ADD COLUMN ... NOT NULL locks the table on Postgres < 11 - Without a constant DEFAULT, Postgres rewrites the entire table under an exclusive lock. On Postgres 11+ with a constant default it is safe, but a runtime-computed default still triggers a rewrite. Use the expand-contract pattern instead.
Composite index leftmost prefix is strictly enforced - An index on (a, b, c) does not help a query that filters only on b or c. A common mistake is adding an index for a multi-column query and then using it in queries that don't start from the leftmost column. Always verify with EXPLAIN ANALYZE.
PgBouncer transaction mode breaks prepared statements - Many ORMs (Prisma, JDBC) use prepared statements by default. In PgBouncer transaction mode, prepared statements don't persist across connections, causing prepared statement "s1" does not exist errors. Disable prepared statements in your driver (prepared_statement_cache_size=0 for JDBC, pgbouncer_mode: transaction for Prisma).
VACUUM doesn't reclaim disk space by default - Regular VACUUM marks dead tuples as reusable but doesn't shrink the file. Only VACUUM FULL returns disk space to the OS, but it acquires an exclusive lock and rewrites the table. Use pg_repack for online space reclamation on production tables.
Partition pruning only works when the partition key is in the WHERE clause - Joining a partitioned table on a non-partition key column causes Postgres to scan all partitions. Always include the partition column in range queries or the planner cannot prune.

References

For detailed patterns and implementation guidance, load the relevant file from references/:

references/query-optimization.md - EXPLAIN ANALYZE deep dive, index types, join strategies, common bottlenecks

Only load a references file if the current task requires it - they are long and will consume context.

References

query-optimization.md

PostgreSQL Query Optimization Reference

A deep reference for reading EXPLAIN plans, choosing index types, understanding join strategies, and resolving the most common query performance bottlenecks in PostgreSQL.

EXPLAIN ANALYZE fundamentals

Basic usage

-- Minimum useful form: always use ANALYZE to get actual row counts
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;

-- Full form: BUFFERS shows I/O, FORMAT TEXT is most readable
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, c.email
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.status = 'pending'
ORDER BY o.created_at DESC
LIMIT 50;

-- For long-running queries: use auto_explain in postgresql.conf
-- auto_explain.log_min_duration = '1s'
-- auto_explain.log_analyze = on
-- auto_explain.log_buffers = on

Anatomy of an EXPLAIN node

-> Index Scan using idx_orders_customer_id on orders  (cost=0.43..8.45 rows=3 width=72)
                                                              ^      ^    ^          ^
                                                         startup  total  estimated  row
                                                           cost   cost    rows      width
   (actual time=0.021..0.031 rows=3 loops=1)
    ^                    ^         ^
    first row time    last row  actual rows  (loops multiplies all of these)
   Buffers: shared hit=5 read=0

cost is a dimensionless planner estimate. Only meaningful in relative comparisons.

actual time is in milliseconds. This is the real number to optimize.

loops - the node was executed this many times. actual time and rows are per loop. Multiply by loops to get the total contribution.

Buffers: shared hit - pages served from shared buffer cache (RAM). Fast. Buffers: shared read - pages read from disk. 100-1000x slower.

Scan node types

Node	Description	When it appears	Cost profile
`Seq Scan`	Reads every page in the table	No usable index, or filter selectivity is low (<~10%)	High for large tables
`Index Scan`	Traverses index, then fetches heap rows	Selective filter with a matching index	Low startup, moderate total
`Index Only Scan`	Answers query entirely from index	All needed columns are in a covering index	Lowest cost for point lookups
`Bitmap Index Scan` + `Bitmap Heap Scan`	Builds a bitmap of matching row locations, then fetches	Moderately selective filter, or OR across multiple indexes	Good middle ground
`TID Scan`	Fetches rows by physical location	Queries using `ctid` directly	Rare, avoid in application code

When Postgres ignores your index

The planner chooses a Seq Scan when it estimates that the index path costs more than a sequential read. This happens when:

Low selectivity - the column has few distinct values (e.g. a boolean, or a status column where 90% of rows have status = 'active'). The planner believes fetching nearly every heap page via the index is slower than a single sequential pass.
Stale statistics - pg_statistic is out of date. Run ANALYZE tablename to refresh, or lower autovacuum_analyze_scale_factor for high-churn tables.
Wrong column order in composite index - filtering on (b, c) when the index is (a, b, c) does not use the index (leftmost prefix rule).
Type mismatch or implicit cast - WHERE created_at = '2024-01-01' on a TIMESTAMPTZ column works, but WHERE date(created_at) = '2024-01-01' wraps the column in a function and prevents index use. Use range predicates instead: WHERE created_at >= '2024-01-01' AND created_at < '2024-01-02'.
small table - Postgres skips the index if the table fits in a few pages. This is correct behavior.

Join strategies

Strategy	Description	Best for
`Nested Loop`	For each outer row, scan inner relation	Small outer set + indexed inner lookup
`Hash Join`	Hash the smaller relation, probe with larger	Equi-joins where neither side is tiny; no index required
`Merge Join`	Sort both sides, merge in order	Large sorted inputs; good for ORDER BY queries

Forcing a join strategy (for testing, never in production)

SET enable_nestloop = off;   -- force hash or merge join
SET enable_hashjoin = off;   -- force merge join
SET enable_mergejoin = off;  -- force hash join
-- Remember to reset after testing:
RESET enable_nestloop;

Hash batches - disk spill warning

Hash  (cost=... rows=50000 ...)
  Buckets: 65536  Batches: 8  Memory Usage: 4096kB
                  ^^^^^^^^^
                  Batches > 1 means the hash spilled to disk

Fix: increase work_mem for the session (SET work_mem = '64MB') or add an index to convert the Hash Join to a Nested Loop with index lookup.

Index types - when to use each

B-tree (default)

-- Good for: equality, range, ORDER BY, IS NULL / IS NOT NULL
CREATE INDEX ON orders(created_at);                   -- range queries
CREATE INDEX ON orders(customer_id, created_at DESC); -- composite + sort

Partial index

Only indexes rows matching a predicate. Stays small even as the full table grows. The query's WHERE clause must imply the partial index predicate for the planner to use it.

-- Indexes only pending orders - stays small as orders are fulfilled
CREATE INDEX idx_orders_pending_customer
  ON orders(customer_id, created_at)
  WHERE status = 'pending';

-- Query that uses this index (planner can infer status = 'pending')
SELECT * FROM orders WHERE customer_id = $1 AND status = 'pending';

Covering index (INCLUDE)

The INCLUDE clause adds non-key columns to the index leaf pages. The index cannot be used for filtering or sorting on included columns, but an Index Only Scan can return them without a heap fetch.

CREATE INDEX idx_products_sku_cover
  ON products(sku)
  INCLUDE (name, price_cents);

-- This query needs sku for lookup and name, price_cents for output.
-- With the covering index: Index Only Scan, no heap access.
SELECT name, price_cents FROM products WHERE sku = $1;

GIN - JSONB and full-text

-- Index all keys in a JSONB column
CREATE INDEX idx_events_payload ON events USING GIN (payload);

-- Query uses GIN for containment check
SELECT * FROM events WHERE payload @> '{"type": "purchase"}';

-- Full-text search
CREATE INDEX idx_products_fts ON products
  USING GIN (to_tsvector('english', name || ' ' || description));

SELECT * FROM products
WHERE to_tsvector('english', name || ' ' || description)
      @@ plainto_tsquery('english', 'wireless keyboard');

BRIN - large append-only tables

-- Events table with natural time ordering. BRIN stores min/max per block range.
-- Tiny index size (128 pages vs millions for B-tree), but coarser.
CREATE INDEX idx_events_created_brin ON events
  USING BRIN (created_at) WITH (pages_per_range = 128);

-- Effective only when the table is physically ordered by created_at (append-only pattern)

Common bottlenecks and fixes

N+1 at the SQL level

Symptom: Nested Loop with a large outer row count, many Index Scans on the inner relation.

-- N+1: one query per order to get customer
SELECT id FROM orders WHERE status = 'pending'; -- returns 5000 rows
-- then for each: SELECT email FROM customers WHERE id = ?

-- Fix: single JOIN
SELECT o.id, c.email
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.status = 'pending';

Missing index on foreign key

PostgreSQL does NOT automatically create indexes on foreign key columns (unlike MySQL). Every FK referencing a parent table needs an explicit index on the child side or ON DELETE CASCADE / ON DELETE RESTRICT will cause full scans on the child table.

-- Always add an index on the FK column
CREATE INDEX idx_order_items_order_id ON order_items(order_id);
CREATE INDEX idx_order_items_product_id ON order_items(product_id);

Function on indexed column

-- BAD: function wraps the column, index unused
SELECT * FROM users WHERE lower(email) = 'alice@example.com';

-- FIX option 1: expression index
CREATE INDEX idx_users_email_lower ON users(lower(email));

-- FIX option 2: store the normalized form
ALTER TABLE users ADD COLUMN email_lower TEXT GENERATED ALWAYS AS (lower(email)) STORED;
CREATE INDEX ON users(email_lower);

High dead tuple ratio (bloat)

-- Check table bloat
SELECT relname,
       n_dead_tup,
       n_live_tup,
       round(n_dead_tup::numeric / NULLIF(n_live_tup + n_dead_tup, 0) * 100, 1) AS dead_pct,
       last_autovacuum,
       last_autoanalyze
FROM pg_stat_user_tables
ORDER BY dead_pct DESC NULLS LAST;

-- Trigger immediate vacuum + analyze
VACUUM ANALYZE orders;

-- Reclaim space (locks table briefly)
VACUUM FULL orders; -- avoid in production unless bloat is severe

Tune autovacuum for high-churn tables to prevent bloat from accumulating:

ALTER TABLE orders SET (
  autovacuum_vacuum_scale_factor = 0.01,   -- trigger at 1% dead tuples (default 20%)
  autovacuum_analyze_scale_factor = 0.005  -- update stats at 0.5% changes
);

Lock contention

-- See what is waiting and what is blocking
SELECT
  blocked.pid,
  blocked.query                       AS blocked_query,
  blocking.pid                        AS blocking_pid,
  blocking.query                      AS blocking_query,
  now() - blocked.query_start         AS wait_duration
FROM pg_stat_activity blocked
JOIN pg_stat_activity blocking
  ON blocking.pid = ANY(pg_blocking_pids(blocked.pid))
WHERE blocked.wait_event_type = 'Lock';

-- Kill a blocking query (graceful)
SELECT pg_cancel_backend(<blocking_pid>);

-- Kill a blocking query (immediate)
SELECT pg_terminate_backend(<blocking_pid>);

Query planning configuration knobs

Parameter	Default	Tune when
`work_mem`	4MB	Hash joins spilling to disk, sort operations slow
`shared_buffers`	128MB	Buffer cache hit rate low; set to 25% of RAM
`effective_cache_size`	4GB	Planner underestimates cache; set to 50-75% of RAM
`random_page_cost`	4.0	On SSD: set to 1.1-2.0; planner prefers seq scans too much otherwise
`enable_seqscan`	on	Set to `off` in session to force index use during testing only

-- Temporary session-level tuning for a heavy report query
SET work_mem = '256MB';
SET enable_seqscan = off;  -- testing only
-- run query
RESET ALL;

Query optimization checklist

Run EXPLAIN (ANALYZE, BUFFERS) before touching any code
Find the node with the highest actual time - that is the bottleneck
Check rows estimate vs actual - large divergence means stale statistics (ANALYZE)
Look for Seq Scan on large tables with low selectivity filters
Look for Hash Join with Batches > 1 - increase work_mem or add an index
Look for Nested Loop with high outer loop count - classic N+1 pattern
Check for function calls wrapping indexed columns in WHERE
Verify FK columns have indexes on the child side
Check pg_stat_user_tables for high dead tuple ratios
After adding an index: re-run EXPLAIN to confirm it is used (Index Scan or Index Only Scan)

Frequently Asked Questions

What is database-engineering?

How do I install database-engineering?

Run npx skills add AbsolutelySkilled/AbsolutelySkilled --skill database-engineering in your terminal. The skill will be immediately available in your AI coding agent.

What AI agents support database-engineering?

database-engineering works with claude-code, gemini-cli, openai-codex. Install it once and use it across any supported AI coding agent.

Is database-engineering free?

Yes, database-engineering is completely free and open source under the MIT license. Install it with a single command and start using it immediately.

What is the difference between database-engineering and similar tools?

database-engineering is an AI agent skill that teaches your coding agent specialized software engineering knowledge. Unlike standalone tools, it integrates directly into claude-code, gemini-cli, openai-codex and other AI agents.

Can I use database-engineering with Cursor or Windsurf?

database-engineering works with any AI coding agent that supports the skills protocol, including Claude Code, Cursor, Windsurf, GitHub Copilot, Gemini CLI, and 40+ more.

database-engineering

What is database-engineering?

Quick Start

database-engineering

Quick Facts

How to Install

Overview

Tags

Platforms

Related Skills

Frequently Asked Questions

What is database-engineering?

How do I install database-engineering?

What AI agents support database-engineering?

Maintainers

SKILL.md

Database Engineering

When to use this skill

Key principles

Core concepts

Normalization forms

Index types

ACID and WAL

Connection pooling

Read replicas

Common tasks

Design a normalized schema

Create effective indexes

Read and optimize EXPLAIN plans

Write safe migrations

Implement soft deletes vs hard deletes

Set up connection pooling

Partition large tables

Error handling

Gotchas

References

References

query-optimization.md

PostgreSQL Query Optimization Reference

EXPLAIN ANALYZE fundamentals

Basic usage

Anatomy of an EXPLAIN node

Scan node types

When Postgres ignores your index

Join strategies

Forcing a join strategy (for testing, never in production)

Hash batches - disk spill warning

Index types - when to use each

B-tree (default)

Partial index

Covering index (INCLUDE)

GIN - JSONB and full-text

BRIN - large append-only tables

Common bottlenecks and fixes

N+1 at the SQL level

Missing index on foreign key

Function on indexed column

High dead tuple ratio (bloat)

Lock contention

Query planning configuration knobs

Query optimization checklist

Frequently Asked Questions

What is database-engineering?

How do I install database-engineering?

What AI agents support database-engineering?

Is database-engineering free?

What is the difference between database-engineering and similar tools?

Can I use database-engineering with Cursor or Windsurf?