system-design

Use this skill when designing distributed systems, architecting scalable services, preparing for system design interviews, or making infrastructure decisions. Triggers on load balancing, CAP theorem, sharding, replication, caching strategies, message queues, microservices architecture, database selection, rate limiting, and any task requiring high-level system architecture decisions.

What is system-design?

Quick Start

Open your terminal or command prompt
Run: npx skills add AbsolutelySkilled/AbsolutelySkilled --skill system-design
Start your AI coding agent (Claude Code, Cursor, Gemini CLI, or any supported agent)
The system-design skill is now active and ready to use

Overview Files

system-design

system-design is a production-ready AI agent skill for claude-code, gemini-cli, openai-codex. Designing distributed systems, architecting scalable services, preparing for system design interviews, or making infrastructure decisions.

Quick Facts

Field	Value
Category	engineering
Version	0.1.0
Platforms	claude-code, gemini-cli, openai-codex
License	MIT

How to Install

Make sure you have Node.js installed on your machine.
Run the following command in your terminal:

npx skills add AbsolutelySkilled/AbsolutelySkilled --skill system-design

The system-design skill is now available in your AI coding agent (Claude Code, Gemini CLI, OpenAI Codex, etc.).

Overview

A practical framework for designing distributed systems and architecting scalable services. This skill covers the core building blocks - load balancers, databases, caches, queues, and CDNs - plus the trade-off reasoning required to use them well. It is built around interview scenarios because they compress the full design process into a repeatable structure you can also apply in real-world architecture decisions. Agents can use this skill to work through any system design problem from capacity estimation through detailed component design.

Platforms

claude-code
gemini-cli
openai-codex

Related Skills

Pair system-design with these complementary skills:

Frequently Asked Questions

What is system-design?

How do I install system-design?

Run npx skills add AbsolutelySkilled/AbsolutelySkilled --skill system-design in your terminal. The skill will be immediately available in your AI coding agent.

What AI agents support system-design?

This skill works with claude-code, gemini-cli, openai-codex. Install it once and use it across any supported AI coding agent.

Maintainers

@maddhruv

Generated from AbsolutelySkilled

SKILL.md

System Design

When to use this skill

Trigger this skill when the user:

Asks "how would you design X?" where X is a product or service
Needs to choose between SQL and NoSQL databases
Is evaluating load balancing, sharding, or replication strategies
Asks about the CAP theorem or consistency vs availability trade-offs
Is designing a caching strategy (what to cache, where, how to invalidate)
Needs to estimate traffic, storage, or bandwidth for a system
Is preparing for a system design interview
Asks about rate limiting, API gateways, or CDN placement

Do NOT trigger this skill for:

Line-level code review or specific algorithm implementations (use a coding skill)
DevOps/infrastructure provisioning details like Terraform or Kubernetes manifests

Key principles

Start simple and justify complexity - Design the simplest system that satisfies the requirements. Introduce each new component (queue, cache, shard) only when you can name the specific constraint it solves. Complexity is a cost, not a feature.
Network partitions will happen - choose C or A - CAP theorem says distributed systems must sacrifice either consistency or availability during a partition. You cannot avoid partitions (P is not a choice). Pick CP for financial and inventory data; pick AP for feeds, caches, and preferences.
Scale horizontally, partition vertically - Stateless services scale out behind a load balancer. Data scales by separating hot from cold paths: read replicas before sharding, sharding before multi-region. Vertical scaling buys time; horizontal scaling buys headroom.
Design for failure at every layer - Every service will go down. Every disk will fill. Design fallback behavior before the happy path. Timeouts, retries with backoff, circuit breakers, and bulkheads are not optional refinements - they are table stakes.
Single responsibility for components - A component that does two things will be bad at both. Load balancers balance load. Caches serve reads. Queues decouple producers from consumers. Mixing responsibilities creates invisible coupling that makes the system fragile under load.

Core concepts

System design assembles six core building blocks. Each solves a specific problem.

Load balancers distribute requests across backend instances. L4 balancers route by TCP/IP; L7 balancers route by HTTP path, headers, and cookies. Use L7 for HTTP services. Algorithms: round-robin (default), least-connections (when request latency varies), consistent hashing (when you need sticky routing, e.g., cache affinity).

Caches reduce read latency and database load. Sit in front of the database. Patterns: cache-aside (default), write-through (strong consistency), write-behind (high write throughput, tolerate loss). Key concerns: TTL, invalidation strategy, and stampede prevention. Redis is the default; Memcached only when pure key-value at massive scale.

Databases are the source of truth. SQL for structured data with ACID transactions; NoSQL for scale, flexible schemas, or specific access patterns. Read replicas for read-heavy workloads. Sharding for write-heavy workloads that exceed one node.

Message queues decouple producers from consumers and absorb traffic spikes. Use for async work, fan-out events, and unreliable downstream dependencies. Always configure a dead-letter queue. SQS for AWS-native work; Kafka for high-throughput event streaming or replay.

CDNs cache static assets and edge-terminate TLS close to users. Reduces origin load and cuts latency for geographically distributed users. Use for images, JS/CSS, and any content with high read-to-write ratio.

API gateways enforce cross-cutting concerns - auth, rate limiting, request logging, TLS termination - at a single entry point. Never build a custom gateway; use Kong, Envoy, or a managed provider.

Common tasks

Design a URL shortener

Clarifying questions: Read-heavy or write-heavy? Need analytics? Custom slugs? Global or single-region?

Components:

API service (stateless, horizontally scaled) behind L7 load balancer
Key generation service - pre-generate Base62 short codes in batches and store in a pool; avoids hot write path
Database - a relational DB works at moderate scale; switch to Cassandra for multi-region or >100k writes/sec
Cache (Redis) - store short_code -> long_url mappings; TTL 24 hours; cache-aside

Redirect flow: Client hits CDN -> cache hit returns 301/302 -> cache miss reads DB -> populates cache -> returns redirect.

Scale signal: 100M URLs stored, 10B reads/day -> cache hit rate must be >99% to protect the DB.

Design a rate limiter

Algorithm choices:

Token bucket (default) - allows bursts up to bucket capacity; fills at a constant rate. Best for user-facing APIs.
Fixed window - simple counter per time window. Prone to burst at window edge.
Sliding window log - exact, but memory-intensive.
Sliding window counter - approximation using two fixed windows. Good balance.

Storage: Redis with atomic INCR and EXPIRE. Single Redis node is enough up to ~50k RPS per rule; use Redis Cluster for more.

Placement: In the API gateway (preferred) or as middleware. Always return X-RateLimit-Remaining and Retry-After headers with 429 responses.

Distributed concern: With multiple gateway nodes, the counter must be centralized (Redis) - local counters undercount.

Design a notification system

Components:

Notification API - accepts events from internal services
Router service - reads user preferences and determines channels (push, email, SMS)
Channel-specific workers (separate services) - dequeued from per-channel queues
Template service - renders notification copy
Delivery tracking - records sent/delivered/failed per notification

Queue design: One queue per channel (push-queue, email-queue, sms-queue). Isolates failure - SMS provider outage does not back up email delivery.

Critical path vs non-critical path:

OTP and security alerts: synchronous, priority queue
Marketing and social notifications: async, best-effort, can be batched

Design a chat system

Protocol: WebSockets for real-time bidirectional messaging. Long-polling as fallback for restrictive networks.

Storage split:

Message history: Cassandra, keyed by (channel_id, timestamp). Append-only, high write throughput, easy time-range queries.
User presence and metadata: Redis (in-memory, fast reads).
User and channel info: PostgreSQL (relational, ACID).

Fanout: When a user sends a message, the server writes to the DB and then publishes to a pub/sub channel (Redis Pub/Sub or Kafka). Each recipient's connection server subscribes to relevant channels and pushes to the WebSocket.

Scale concern: Connection servers are stateful (WebSockets). Route users to the same connection server with consistent hashing. Use a service mesh for connection server discovery.

Choose between SQL vs NoSQL

Use this decision table:

Need	Choose
ACID transactions across multiple entities	SQL
Complex joins and ad-hoc queries	SQL
Strict schema with referential integrity	SQL
Horizontal write scaling beyond single node	NoSQL (Cassandra, DynamoDB)
Flexible or evolving schema	NoSQL (MongoDB, DynamoDB)
Graph traversals	Graph DB (Neo4j)
Time-series data at high ingestion rate	TimescaleDB or InfluxDB
Key-value at very high throughput	Redis or DynamoDB

Default: Start with PostgreSQL. It handles far more scale than most teams expect and its JSONB column covers flexible-schema needs up to moderate scale. Migrate to specialized stores when you have a measured bottleneck.

Estimate system capacity

Use the following rough constants in back-of-envelope estimates:

Metric	Value
Seconds per day	~86,400 (~100k rounded)
Bytes per ASCII character	1
Average tweet/post size	~300 bytes
Average image (compressed)	~300 KB
Average video (1 min, 720p)	~50 MB
QPS from 1M DAU, 10 actions/day	~115 QPS

Process:

Clarify scale (DAU, requests per user per day)
Derive QPS: (DAU * requests_per_day) / 86400
Derive peak QPS: average QPS * 2-3x
Derive storage: writes_per_day * record_size * retention_days
Derive bandwidth: peak QPS * average_response_size

State assumptions explicitly. Interviewers care about your reasoning, not the exact number.

Design caching strategy

Step 1 - Identify what to cache:

Expensive reads that change infrequently (user profiles, product catalog)
Computed aggregations (dashboard stats, leaderboards)
Session tokens and auth lookups

Do NOT cache: frequently mutated data, financial balances, anything requiring strong consistency.

Step 2 - Choose pattern:

Default: cache-aside with TTL
Strong read-after-write: write-through
High write throughput, loss acceptable: write-behind

Step 3 - Define invalidation:

TTL expiry for most cases
Explicit DELETE on write for cache-aside
Never try to update a cached value in-place; DELETE then let the next read repopulate

Step 4 - Prevent stampede:

Use a distributed lock (Redis SETNX) for high-traffic keys
Add jitter to TTLs (base TTL +/- 10-20%) to spread expiry

Anti-patterns / common mistakes

Mistake	Why it's wrong	What to do instead
Designing without clarifying requirements	You optimize for the wrong bottleneck and miss key constraints	Always spend 5 minutes on scope: scale, consistency needs, latency SLAs
Sharding before replication	Sharding is complex and expensive; replication + caching handles most read bottlenecks	Add read replicas and caching first; only shard when writes are the bottleneck
Shared database between services	Creates hidden coupling; one service's slow query can kill another	One database per service; expose data through APIs or events
Cache without invalidation plan	Stale reads cause data inconsistency; cache-DB drift grows silently	Define TTL and invalidation triggers before adding any cache
Ignoring the tail: all QPS estimates as average	p99 latency matters more than p50; a 2x peak multiplier is the minimum	Always model peak QPS (2-3x average) and design capacity for it
Single point of failure at every layer	Load balancer with no standby, single queue broker, one region	Identify SPOFs explicitly; add redundancy for any component whose failure kills the system

Gotchas

CAP theorem is about partitions, not a free choice - You cannot "choose" to sacrifice partition tolerance. P is always present in distributed systems. The real choice is between C and A when a partition occurs. Framing it as a three-way trade-off is wrong.
Caching invalidation is the hard part, not caching itself - Most designs add Redis without defining when data becomes stale. The moment a cache-aside entry is written, define the exact condition that invalidates it. "We'll figure that out later" causes stale reads in production.
Read replicas have replication lag - Writes go to the primary; reads from replicas may be 10-100ms stale. If you route reads to replicas immediately after writes (e.g., "create, then fetch profile"), users will see the old version. Use read-after-write consistency or route critical reads to primary.
Consistent hashing does not eliminate hotspots - If one key receives dramatically more requests than others (celebrity user, viral post), consistent hashing still routes all requests for that key to the same shard. Solve with key-based sharding variants like adding a suffix, or cache at a higher layer.
Message queues do not guarantee exactly-once delivery - SQS standard queues deliver at-least-once; consumers must be idempotent. Kafka can deliver exactly-once within a single cluster but not across network boundaries. Design consumers to handle duplicate messages before relying on queue semantics.

References

For detailed frameworks and opinionated defaults, read the relevant file from the references/ folder:

references/interview-framework.md - step-by-step interview process (RESHADED), time allocation, common follow-up questions, and how to communicate trade-offs

Only load the references file when the task requires it - it is long and will consume context.

References

interview-framework.md

System Design Interview Framework

A repeatable process for structuring system design interviews at FAANG and equivalent companies. The framework is called RESHADED and gives you a step-by-step structure so you never lose your place under pressure.

The RESHADED Framework

Step	Letter	What you do
1	R - Requirements	Clarify functional and non-functional requirements
2	E - Estimation	Back-of-envelope capacity estimates
3	S - Storage	Choose data model and database type
4	H - High-level design	Draw the major components and data flow
5	A - APIs	Define the public-facing or internal API contract
6	D - Detail	Deep dive into 1-2 components the interviewer cares about
7	E - Evaluation	Identify bottlenecks, SPOFs, and trade-offs
8	D - Distinctive features	Scale, fault tolerance, or advanced features

Step 1: Requirements (5 minutes)

Never start drawing boxes until you have nailed the requirements. Most failed interviews are failed in the first 5 minutes by jumping to solutions.

Functional requirements

Ask: "What should the system do? What are the core features?"

Focus on the minimum required - do not expand scope yourself. For a URL shortener:

Users can shorten a URL
Users can be redirected via the short URL
(Optional) Users can see click analytics

Non-functional requirements

Ask these specifically - interviewers expect you to:

Question	Why it matters
What is the expected scale? (DAU, QPS)	Determines if a single machine works or you need distribution
What is the read/write ratio?	Drives caching strategy and DB choice
What is the latency SLA?	Determines where caches are needed
Is strong consistency required or is eventual OK?	Drives CAP choice
What is the availability target? (99.9%? 99.99%?)	Determines redundancy level
Geographic distribution needed?	Determines if multi-region is in scope
Data retention period?	Affects storage estimates

Functional vs non-functional checklist

Core user actions defined (create, read, update, delete what?)
Expected scale confirmed (DAU, peak QPS)
Consistency requirement confirmed (strong vs eventual)
Latency SLA noted
Out-of-scope features explicitly named

Step 2: Estimation (5 minutes)

Do this out loud. Interviewers assess your ability to reason about numbers, not get them exactly right.

Reference constants

Metric	Value
Seconds per day	86,400 (use 100,000 to round up)
Bytes per ASCII char	1 byte
Average URL	100 bytes
Average user profile	1 KB
Average tweet/post	300 bytes
Average image (compressed JPEG)	300 KB
Average HD video (1 minute)	50 MB
1 TB	10^12 bytes
1 PB	10^15 bytes

Estimation process

1. QPS:       (DAU * actions_per_user_per_day) / 86,400
2. Peak QPS:  average QPS * 2-3x
3. Storage:   writes_per_day * record_size * retention_days
4. Bandwidth: peak QPS * average_response_size
5. Cache:     storage * hot_data_fraction (typically 20% of data = 80% of reads)

Example - URL shortener

Scale: 100M DAU, 1 write per 10 users per day, 100 reads per write

Writes:  (100M / 10) / 86400 = 10M/day = ~116 writes/sec
Reads:   116 * 100 = 11,600 reads/sec
Storage: 10M records/day * 100 bytes * 365 days * 5 years = ~18 TB (5 years)
Cache:   ~20% of recent URLs = 80% of reads -> cache ~1 TB of hot URLs

State your assumptions explicitly: "I'm assuming 300 bytes per URL, 5 year retention."

Step 3: Storage (3-5 minutes)

Define the data model first, then pick the technology.

Data model

List the main entities and their core fields:

User:    user_id (PK), email, created_at
URL:     short_code (PK), original_url, user_id (FK), created_at, expires_at
Click:   click_id (PK), short_code (FK), timestamp, ip, referrer

Database selection decision table

Requirement	Choice
ACID transactions, complex joins, well-defined schema	PostgreSQL (default)
Horizontal write scaling (>100K writes/sec)	Cassandra or DynamoDB
Flexible/evolving schema + JSON documents	MongoDB
Key-value at very high throughput	Redis or DynamoDB
Full-text search	Elasticsearch
Graph traversals (social networks)	Neo4j
Time-series (metrics, IoT)	InfluxDB or TimescaleDB

Default: PostgreSQL. Change only with a specific measured reason.

Sharding strategy (if needed)

Strategy	Best for	Risk
Range-based	Time-series data	Hot partitions if traffic is recent-biased
Hash-based	Even distribution needed	Cannot do range queries across shards
Directory-based	Complex routing logic	Directory itself becomes SPOF

Step 4: High-Level Design (10 minutes)

Draw the system on a whiteboard (or describe components clearly). Include:

Client (mobile, web, CLI)
DNS - not usually a design concern, but note CDN here
CDN - for static assets and geographically distributed reads
Load balancer - L7 for HTTP services
API servers - stateless, horizontally scalable
Cache - Redis, positioned between API servers and database
Database - primary + read replicas
Message queue - if async processing is needed
Workers - consumers of the queue

Describe the data flow for the two most critical use cases (e.g., write a URL, redirect a URL). Say what happens at each hop.

Step 5: APIs (3-5 minutes)

Define the external API contracts. Be specific about HTTP method, path, and payload shape.

POST /api/v1/urls
  Body: { "original_url": "https://...", "custom_slug": "optional" }
  Response 201: { "short_url": "https://sho.rt/abc123", "expires_at": "..." }

GET /{short_code}
  Response 301: Location: https://original-url.com
  Response 404: { "error": "not found" }

GET /api/v1/urls/{short_code}/stats
  Response 200: { "clicks": 1234, "last_clicked_at": "..." }

Note: use 301 (permanent) redirects to save server load via browser caching. Use 302 (temporary) if you need to track every click server-side.

Step 6: Detail (10 minutes)

Pick 1-2 components and go deep. Let the interviewer guide which one. Common deep-dive areas:

Deep dive: key generation

Naive approach: generate a random 6-char Base62 code on each write. Problem: hash collisions as the dataset grows.

Better approach: pre-generation pool

A background worker generates Base62 codes offline and stores them in a key_pool table (status: unused/used).
On each write request, the API server picks one key from the pool (atomic SELECT + UPDATE to mark used).
Keep two copies: used_keys and unused_keys for fast failover.

Benefit: no collision checking at write time; sub-millisecond key assignment.

Deep dive: caching strategy

Read path:
  1. Check Redis: GET short_code
  2. Hit: return URL, record impression asynchronously
  3. Miss: query PostgreSQL, cache result with TTL=24h, return URL

Write path:
  1. Insert to PostgreSQL
  2. Do NOT pre-populate cache (lazy loading - cache-aside)
  3. Hot URLs will self-populate after first redirect

Invalidation: when a URL is deleted or expires, DEL short_code from Redis and let the next read propagate naturally.

Step 7: Evaluation (5 minutes)

Walk through three questions:

1. Where are the single points of failure?

For each component, ask: "what happens if this dies?"

Load balancer -> add standby LB with failover (AWS ELB handles this)
Primary DB -> add replica; promote on failure (< 30 sec with auto-failover)
Redis -> add replica; cache misses fall back to DB (system degrades gracefully)
Message queue -> managed service (SQS/Kafka) has built-in replication

2. Where are the bottlenecks?

Use your Step 2 estimates:

11,600 reads/sec -> Redis at < 1ms is the right tool; DB alone can't handle this
116 writes/sec -> PostgreSQL handles this easily; no sharding needed

3. What trade-offs did you make?

Name at least two:

"I chose 301 redirects over 302 to reduce server load, which means we can't track every click without client-side instrumentation."
"I'm using eventual consistency for click analytics to avoid blocking the redirect flow on write."

Step 8: Distinctive Features (5 minutes)

Push to the next level if time allows. Pick one:

Multi-region active-active

Route users to the nearest region via GeoDNS
Replicate URL metadata with async replication (AP - eventual consistency)
Use CRDT counters for click analytics to merge without conflicts

Fault tolerance

Add a circuit breaker on the DB pool; serve a "please try again" page when the DB is unhealthy rather than queuing connections until OOM
Use bulkhead pattern: analytics writes go through a separate connection pool; an analytics storm cannot exhaust the redirect connection pool

Hot key problem

A viral URL can exceed one Redis node's throughput
Solution: replicate the hot key to N Redis nodes; read from a random one
Or: local in-process cache (LRU, 100 entries, 10-second TTL) on each API server

Time allocation

Total typical interview: 45-60 minutes.

Step	Time
Requirements (R)	5 min
Estimation (E)	3-5 min
Storage (S)	3-5 min
High-level design (H)	10 min
APIs (A)	3-5 min
Detail (D)	10-15 min
Evaluation (E)	5 min
Distinctive features (D)	5 min
Buffer / interviewer questions	5 min

Rule: Never spend more than 15 minutes on high-level design without the interviewer's prompting. Move to detail early - that's where most marks are given.

Common follow-up questions and how to answer them

Question	Key points to hit
"How does this scale to 10x traffic?"	Identify which component breaks first; add the right layer (cache, replica, shard)
"What happens if the database goes down?"	Describe read replica failover, circuit breaker behavior, graceful degradation
"How do you handle duplicate requests?"	Idempotency key on writes; atomic check-and-set in Redis
"How do you handle hot keys in the cache?"	Local in-process cache + jitter on TTLs
"What if a message is delivered twice?"	Idempotent consumers with a deduplication store
"How would you monitor this system?"	RED metrics per service, alerting on SLO burn rate, distributed tracing

Common design patterns by problem type

Problem type	Key patterns
URL shortener	Pre-generation key pool, cache-aside, 301 redirect
Feed (Twitter/Instagram)	Fan-out on write (small accounts) + fan-out on read (celebrities), Redis sorted sets
Typeahead / autocomplete	Trie in Redis, prefix hash, tiered caching
Distributed counter	Redis INCR, approximate counting (HyperLogLog), eventual consistency
Distributed lock	Redis SETNX with expiry, Redlock for multi-node
Leaderboard	Redis sorted sets (ZADD, ZRANGE)
Search	Elasticsearch with inverted index; Kafka for real-time indexing pipeline
Video/image upload	Direct S3 upload with presigned URL; metadata in PostgreSQL; CDN for delivery
Payment system	Idempotency keys, ACID transactions, CP database, event sourcing for audit

Interview anti-patterns to avoid

Jumping to microservices before the problem demands it - start with a monolith unless requirements clearly show independent scaling needs
Designing in silence - narrate every decision; interviewers score your thinking, not just the diagram
Over-engineering the happy path, ignoring failure modes - explicitly name what happens when each component fails
Picking exotic tech to impress - using Cassandra for 1000 QPS is wrong; PostgreSQL is the right answer
Refusing to make trade-offs - everything is a trade-off; say so, then commit to one option and justify it
Ignoring the non-functional requirements you agreed on - if you said p99 < 100ms, every component decision must serve that constraint

Frequently Asked Questions

What is system-design?

How do I install system-design?

Run npx skills add AbsolutelySkilled/AbsolutelySkilled --skill system-design in your terminal. The skill will be immediately available in your AI coding agent.

What AI agents support system-design?

system-design works with claude-code, gemini-cli, openai-codex. Install it once and use it across any supported AI coding agent.

Is system-design free?

Yes, system-design is completely free and open source under the MIT license. Install it with a single command and start using it immediately.

What is the difference between system-design and similar tools?

system-design is an AI agent skill that teaches your coding agent specialized software engineering knowledge. Unlike standalone tools, it integrates directly into claude-code, gemini-cli, openai-codex and other AI agents.

Can I use system-design with Cursor or Windsurf?

system-design works with any AI coding agent that supports the skills protocol, including Claude Code, Cursor, Windsurf, GitHub Copilot, Gemini CLI, and 40+ more.

system-design

What is system-design?

Quick Start

system-design

Quick Facts

How to Install

Overview

Tags

Platforms

Related Skills

Frequently Asked Questions

What is system-design?

How do I install system-design?

What AI agents support system-design?

Maintainers

SKILL.md

System Design

When to use this skill

Key principles

Core concepts

Common tasks

Design a URL shortener

Design a rate limiter

Design a notification system

Design a chat system

Choose between SQL vs NoSQL

Estimate system capacity

Design caching strategy

Anti-patterns / common mistakes

Gotchas

References

References

interview-framework.md

System Design Interview Framework

The RESHADED Framework

Step 1: Requirements (5 minutes)

Functional requirements

Non-functional requirements

Functional vs non-functional checklist

Step 2: Estimation (5 minutes)

Reference constants

Estimation process

Example - URL shortener

Step 3: Storage (3-5 minutes)

Data model

Database selection decision table

Sharding strategy (if needed)

Step 4: High-Level Design (10 minutes)

Step 5: APIs (3-5 minutes)

Step 6: Detail (10 minutes)

Deep dive: key generation

Deep dive: caching strategy

Step 7: Evaluation (5 minutes)

Step 8: Distinctive Features (5 minutes)

Multi-region active-active

Fault tolerance

Hot key problem

Time allocation

Common follow-up questions and how to answer them

Common design patterns by problem type

Interview anti-patterns to avoid

Frequently Asked Questions

What is system-design?

How do I install system-design?

What AI agents support system-design?

Is system-design free?

What is the difference between system-design and similar tools?

Can I use system-design with Cursor or Windsurf?