technical-seo

Use this skill when working on technical SEO infrastructure - crawlability, indexing, XML sitemaps, canonical URLs, robots.txt, redirect chains, rendering strategies (SSR/SSG/ISR/CSR), crawl budget optimization, and search engine rendering. Triggers on fixing indexing issues, configuring crawl directives, choosing rendering strategies for SEO, debugging Google Search Console errors, or auditing site architecture for search engines.

What is technical-seo?

Overview Files

technical-seo

technical-seo is a production-ready AI agent skill for claude-code, gemini-cli, openai-codex, and 1 more. Working on technical SEO infrastructure - crawlability, indexing, XML sitemaps, canonical URLs, robots.txt, redirect chains, rendering strategies (SSR/SSG/ISR/CSR), crawl budget optimization, and search engine rendering.

Quick Facts

Field	Value
Category	marketing
Version	0.1.0
Platforms	claude-code, gemini-cli, openai-codex, mcp
License	MIT

How to Install

Make sure you have Node.js installed on your machine.
Run the following command in your terminal:

npx skills add AbsolutelySkilled/AbsolutelySkilled --skill technical-seo

The technical-seo skill is now available in your AI coding agent (Claude Code, Gemini CLI, OpenAI Codex, etc.).

Overview

The infrastructure layer of SEO. Technical SEO ensures search engines can discover, crawl, render, and index your pages. It is the foundation - if crawling fails, content quality and link building are irrelevant. This skill covers the crawl-index-rank pipeline and the engineering decisions that make or break search visibility.

Platforms

claude-code
gemini-cli
openai-codex
mcp

Related Skills

Pair technical-seo with these complementary skills:

Frequently Asked Questions

What is technical-seo?

How do I install technical-seo?

Run npx skills add AbsolutelySkilled/AbsolutelySkilled --skill technical-seo in your terminal. The skill will be immediately available in your AI coding agent.

What AI agents support technical-seo?

This skill works with claude-code, gemini-cli, openai-codex, mcp. Install it once and use it across any supported AI coding agent.

Maintainers

@maddhruv

Generated from AbsolutelySkilled

SKILL.md

Technical SEO

When to use this skill

Trigger this skill when the user:

Reports pages not showing in Google Search or Index Coverage errors in Search Console
Needs to configure or debug robots.txt directives
Wants to generate or fix an XML sitemap
Is setting up canonical URLs or resolving duplicate content issues
Has redirect chains or wants to audit redirects
Is choosing a rendering strategy (SSR, SSG, ISR, CSR) with SEO as a constraint
Is debugging why Googlebot cannot see content that users can
Wants to optimize crawl budget on a large site (10k+ pages)

Do NOT trigger this skill for:

Content strategy, editorial calendars, or keyword research
Link building, backlink analysis, or off-page SEO

Key principles

Crawlable before rankable - A page that Googlebot cannot reach cannot rank. Discovery is step one in the pipeline. Fix crawl and index issues before any other SEO work. Crawlability is a precondition, not a ranking factor.
One canonical URL per piece of content - Every distinct piece of content must have exactly one URL that all signals consolidate on. HTTP vs HTTPS, www vs non-www, trailing slash vs none, query parameters - each variant dilutes ranking signals unless canonicalized to a single source of truth.
Rendering strategy is an SEO architecture decision - Whether your page is rendered at build time (SSG), at request time on the server (SSR), or in the browser (CSR) determines whether Googlebot sees your content on the first crawl or must wait for a second-wave JavaScript render. Make this decision deliberately.
robots.txt blocks crawling, not indexing - A page blocked in robots.txt can still be indexed if other pages link to it. Googlebot sees the URL via links but cannot read the content, so it may index a thin or empty page. Use noindex in the HTTP response header or meta tag to prevent indexing, not robots.txt.
Redirect chains waste crawl budget and dilute link equity - Each hop in a redirect chain costs crawl budget and reduces the link equity passed through. Keep all redirects as single-hop 301s from old URL directly to final destination.

Core concepts

The crawl-index-rank pipeline

Three sequential phases - failure in any phase stops everything downstream:

Phase	What happens	Common failure modes
Crawl	Googlebot discovers and fetches the URL	robots.txt block, slow server, crawl budget exhausted
Index	Google processes and stores the page	noindex directive, duplicate content, thin content, render failure
Rank	Google assigns position for queries	Content quality, E-E-A-T, links, page experience

Crawl budget

Crawl budget is the number of URLs Googlebot will crawl on your site within a given timeframe. It is a product of crawl rate (how fast Googlebot can crawl without overloading the server) and crawl demand (how much Google wants to crawl based on page value and freshness).

Who needs to care about crawl budget:

Sites with 10k+ pages
Sites with large faceted navigation generating URL permutations
Sites with many low-value or duplicate URLs (pagination, filters, sessions in URLs)
Sites with frequent content updates that need fast re-indexing

Small sites (<1k pages) with clean architecture rarely face crawl budget problems.

Rendering for crawlers

Googlebot can execute JavaScript but does so in a second wave, sometimes days after the initial crawl. Content invisible without JavaScript is at risk:

Rendering	Googlebot sees on first crawl	SEO risk
SSG (static)	Full HTML	None
SSR (server-side)	Full HTML	None
ISR (incremental static)	Full HTML (on cache hit)	Minor - stale cache shows old content
CSR (client-side only)	Empty shell	High - content may not be indexed

URL parameter handling

URL parameters are a major source of duplicate content. Common problematic patterns:

Tracking parameters: ?utm_source=email&utm_campaign=launch
Faceted navigation: ?color=red&size=M&sort=price
Session IDs: ?sessionid=abc123
Pagination: ?page=2

Handle with: canonical tags pointing to the clean URL, robots.txt Disallow for pure tracking parameters, or Google Search Console parameter handling.

Mobile-first indexing

Google indexes and ranks primarily based on the mobile version of your content. Ensure the mobile version has: the same content as desktop, the same structured data, and equivalent meta tags. Blocked mobile CSS/JS is a common cause of mobile-first indexing failures.

Common tasks

Configure robots.txt

# Allow all crawlers to access all content (default, no file needed)
User-agent: *
Allow: /

# Block specific directories from all crawlers
User-agent: *
Disallow: /admin/
Disallow: /internal-search/
Disallow: /checkout/
Disallow: /?*sessionid=  # block session ID URLs

# Allow Googlebot to crawl CSS and JS (critical - never block these)
User-agent: Googlebot
Allow: /*.js$
Allow: /*.css$

# Point to sitemap
Sitemap: https://example.com/sitemap.xml

Never disallow CSS or JS. Googlebot needs them to render your pages. Blocking them degrades rendering quality and can hurt rankings.

Generate an XML sitemap

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/</loc>
    <lastmod>2024-01-15</lastmod>
    <changefreq>weekly</changefreq>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>https://example.com/products/widget</loc>
    <lastmod>2024-01-10</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.8</priority>
  </url>
</urlset>

For large sites, use a sitemap index:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://example.com/sitemaps/products.xml</loc>
    <lastmod>2024-01-15</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemaps/blog.xml</loc>
    <lastmod>2024-01-15</lastmod>
  </sitemap>
</sitemapindex>

Sitemap rules: max 50,000 URLs per file, max 50MB uncompressed. Only include canonical, indexable URLs. Only include lastmod if it reflects genuine content changes - Googlebot learns to ignore dishonest lastmod values.

Set up canonical URLs

In the <head> element:

<link rel="canonical" href="https://example.com/products/widget" />

Handle all URL variants consistently:

<!-- All of these should resolve to one canonical form -->
<!-- https://example.com/products/widget/ -->
<!-- https://example.com/products/widget  -->
<!-- http://example.com/products/widget   -->
<!-- https://www.example.com/products/widget -->

<!-- All pages declare the same canonical -->
<link rel="canonical" href="https://example.com/products/widget" />

For paginated pages, each page is canonically itself (do not canonical page 2 to page 1 unless they have identical content):

<!-- Page 1 -->
<link rel="canonical" href="https://example.com/blog" />

<!-- Page 2 -->
<link rel="canonical" href="https://example.com/blog?page=2" />

Choose a rendering strategy

Decision table for ranking pages (pages you want to appear in search):

Content type	Recommended strategy	Rationale
Marketing pages, landing pages	SSG	Crawled immediately, fast TTFB
Blog posts, documentation	SSG	Rarely changes, build on publish
Product pages (10k-100k)	ISR	Manageable builds, auto-updates
User profiles, social content	SSR	Personalized but crawlable
Search results, filters	SSR + canonical	Crawlable canonical version
Dashboards, account pages	CSR is fine	Behind auth, not indexed anyway

For Next.js:

// SSG - crawled immediately, best for ranking pages
export async function generateStaticParams() { ... }

// ISR - rebuilds on demand, good for large catalogs
export const revalidate = 3600; // revalidate every hour

// SSR - server renders on every request
export const dynamic = 'force-dynamic';

Fix redirect chains

Redirect chains occur when A -> B -> C instead of A -> C directly. Detect and fix:

# Detect redirect chain depth with curl
curl -L -o /dev/null -s -w "%{url_effective} hops: %{num_redirects}\n" \
  https://example.com/old-page

# Follow the chain step by step
curl -I https://example.com/old-page
# Note Location header, then:
curl -I https://example.com/intermediate-page

Fix by updating the origin redirect to point directly to the final URL:

# Before: /old-page -> /intermediate -> /final-page (chain)
# After: /old-page -> /final-page (single hop)

rewrite ^/old-page$ /final-page permanent;

Rules:

301 = permanent redirect (passes link equity, cached by browsers)
302 = temporary redirect (does not pass full link equity, not cached)
Use 301 for SEO unless the redirect is genuinely temporary
Client-side redirects (window.location, meta refresh) do not reliably pass link equity. Always redirect at the server or CDN layer.

Handle URL parameters for faceted navigation

Faceted navigation generates an exponential number of URL combinations. Choose one:

Option A: Canonical to the base category page (simplest)

<!-- /products?color=red&size=M&sort=price -->
<link rel="canonical" href="https://example.com/products" />

Option B: robots.txt disallow parameter combinations

User-agent: *
Disallow: /*?*color=
Disallow: /*?*size=
Disallow: /*?*sort=

Option C: Noindex on parameterized pages

<meta name="robots" content="noindex, follow" />

Option A is preferred when the canonical page has good content. Option B is useful when you want to conserve crawl budget. Option C is the fallback when you need to serve the page to users but not have it indexed.

Set up meta robots directives

In the HTML <head>:

<!-- Default: crawl and index (no tag needed) -->
<meta name="robots" content="index, follow" />

<!-- Do not index, but follow links on this page -->
<meta name="robots" content="noindex, follow" />

<!-- Do not index, do not follow links -->
<meta name="robots" content="noindex, nofollow" />

<!-- Prevent Google from showing a cached version -->
<meta name="robots" content="index, follow, noarchive" />

Via HTTP response header (works for non-HTML resources like PDFs):

X-Robots-Tag: noindex
X-Robots-Tag: noindex, nofollow

Debug indexing issues

When a page is not indexed, work through this checklist in order:

URL Inspection tool in Search Console - checks crawl status, last crawl, indexing decision, and renders a screenshot of what Googlebot sees
robots.txt tester - confirm the URL is not blocked
Live URL test - request indexing and see if Googlebot can render the page
Check for noindex - view source and search for noindex, check HTTP headers
Check canonical - is the canonical pointing to a different URL?
Check content - is there enough unique, substantive content?
Check internal links - is the page linked from anywhere Googlebot can reach?

Anti-patterns / common mistakes

Mistake	Why it is wrong	What to do instead
Blocking CSS/JS in robots.txt	Googlebot cannot render pages, sees empty shells	`Allow: /.js$` and `Allow: /.css$` explicitly
Dishonest `lastmod` in sitemap	Googlebot learns to ignore it; all URLs get low-priority crawls	Only update `lastmod` on genuine content changes
CSR-only rendering for rankable pages	Content in JS is not seen on first crawl; delayed or failed indexing	Use SSG or SSR for any page you want in search results
Client-side redirects for SEO	Meta refresh and JS redirects do not reliably pass link equity	Redirect at server/CDN level with 301
Using robots.txt to prevent indexing	Blocked pages can still be indexed as empty/thin if linked to	Use `noindex` directive in response headers or meta tag
Self-referential canonical loops	Page A canonicals to B, B canonicals to A; Google ignores both	Each URL canonicals to a single definitive URL
Duplicate canonicals pointing to 404s	Signals to Google the canonical URL is invalid	Ensure canonical targets return 200 with real content
Trailing slash inconsistency	Two URLs for every page, dilutes crawl budget and link signals	Enforce one form at the server, canonical the other
Noindex on paginated pages in series	First page gets indexed without context of full series	Only noindex pagination if pages are truly thin/duplicate
Sitemap URLs not matching canonicals	Confuses Googlebot about which URL is authoritative	Sitemap URLs must exactly match their canonical `<link>` tag

Gotchas

Canonical tags are a hint, not a directive - Google can and does override canonical tags if it disagrees with your signal. If your site serves near-identical content at two URLs and one has more internal links, Google may index the more-linked URL regardless of your canonical. Canonical must be reinforced with consistent internal linking and redirects.
Blocking CSS/JS in robots.txt causes Googlebot to see a broken page - Even a single blocked CSS file can prevent Googlebot from rendering a page correctly, leading to indexing of an empty or unstyled shell. Always test with the URL Inspection tool's "Test Live URL" option to confirm Googlebot's rendered view.
noindex in robots.txt does not work - The noindex directive in robots.txt is not a valid robots.txt directive; it is ignored. The only valid noindex placement is in the HTTP response header (X-Robots-Tag) or the HTML <meta name="robots"> tag on the page itself.
Hreflang is validated in both directions - If page A in English points to page B in French, page B must also point back to page A. Missing the return link causes Google to ignore the entire hreflang cluster. Validate every hreflang implementation bidirectionally.
Sitemap lastmod dates that never change train Googlebot to deprioritize your site - Many CMSs emit the current date as lastmod on every page regardless of actual changes. Googlebot learns these dates are dishonest and reduces crawl frequency. Only emit lastmod when content genuinely changed.

References

For detailed implementation guidance, load the relevant reference file:

references/crawlability-indexing.md - crawl budget optimization, Googlebot behavior, log analysis, orphan pages, internal linking for crawlability
references/sitemaps-canonicals.md - XML sitemap spec details, canonical URL rules, hreflang interaction, pagination handling
references/rendering-strategies.md - SSG/SSR/ISR/CSR comparison, framework implementations (Next.js, Nuxt, Astro, Remix), edge rendering, dynamic rendering

Only load a reference file if the current task requires it - they are long and will consume context.

References

crawlability-indexing.md

Crawlability and Indexing Reference

Deep reference for ensuring Googlebot can discover, crawl, and index your pages efficiently. Covers crawl budget mechanics, log file analysis, orphan page detection, internal linking strategy, and the index coverage pipeline.

1. Googlebot Behavior

How Googlebot discovers URLs

Sitemap submission via Search Console or Sitemap: directive in robots.txt
Following links from already-crawled pages (internal and external)
Submitted URLs via URL Inspection tool in Search Console
DNS records and link graphs maintained internally

Googlebot does not crawl pages it cannot reach via links or sitemaps. An isolated page - reachable only via direct URL entry - is an orphan and will likely not be discovered or re-crawled.

Crawl frequency signals

Googlebot adjusts crawl frequency based on:

Signal	Effect
High-quality inbound links to the page	Increases crawl priority
Accurate `lastmod` in sitemap	Influences recrawl scheduling
Fast server response time	Allows more crawls within same rate limit
Consistent 200 status (not flapping)	Builds crawl confidence
Fresh content on recrawl	Increases future crawl frequency
5xx errors on crawl	Reduces crawl rate; Google backs off

Crawl rate vs crawl demand

Crawl rate: How fast Googlebot is allowed to crawl. Controlled by server response speed and the crawl rate setting (adjustable in Search Console, though rarely needed). Google auto-adjusts to avoid overloading your server.

Crawl demand: How much Google wants to crawl your site. Driven by page popularity, freshness signals, and link graph centrality. You influence demand by improving content quality and earning links.

Crawl budget = effective crawls per day = min(crawl rate, crawl demand).

You cannot directly set crawl budget. You influence it by:

Removing low-value URLs from the crawl frontier
Improving server response speed
Ensuring crawled pages return 200 with real content
Building internal links to important pages

2. Crawl Budget Optimization

Identify URL budget waste

Common sources of wasted crawl budget:

Source	Volume	Fix
Faceted navigation permutations	Can be millions	Canonicals + robots.txt disallow
Paginated URLs beyond page 2-3	Grows with content	Noindex or remove from sitemap
Session IDs in URLs	Per-visitor	Block in robots.txt
Tracking parameters	Per-campaign	Canonical to clean URL
Duplicate content with/without trailing slash	2x all URLs	301 redirect to canonical form
Legacy redirect targets still in sitemap	Each wastes a hop	Update sitemap to final URLs
Soft 404 pages	Crawled but wasted	Fix to return 404/410 or real content
Infinite scroll / calendar archives	Unbounded URL space	Pagination with disallow or noindex

Crawl budget audit checklist

1. Pull log file data (see section 3 below)
2. Count unique URLs Googlebot fetched in the last 30 days
3. What % returned 200? Non-200 is wasted budget.
4. What % are canonical? Non-canonical crawls waste budget.
5. What % have noindex? These should ideally not be crawled.
6. Identify top 20 URL templates by volume - are these all valuable?
7. Check robots.txt is not accidentally allowing junk URLs
8. Check sitemap only contains canonical 200 pages

Prioritizing pages for crawl

Internal linking is the most powerful lever for influencing crawl priority. Pages with more internal links get crawled more often. Structure your internal linking to concentrate crawls on:

High-value landing pages (highest revenue/conversion intent)
Recently updated content (freshness signals)
Newly published content (discovery)
Deep pages in large catalogs

Deprioritize by removing internal links from:

Thin pages with low-value content
Near-duplicate pages
Paginated pages beyond first 1-2 pages

3. Log File Analysis

Server access logs are the ground truth for what Googlebot actually crawled. More reliable than Search Console for diagnosing crawl issues.

Extract Googlebot requests from logs

# Apache/Nginx combined log format
grep "Googlebot" /var/log/nginx/access.log | \
  grep -v "AdsBot\|APIs-Google\|Mediapartners" > googlebot.log

# Count requests by URL pattern
awk '{print $7}' googlebot.log | \
  sed 's/?.*$//' | \
  sort | uniq -c | sort -rn | head 50

# Count by status code
awk '{print $9}' googlebot.log | sort | uniq -c | sort -rn

# Find most crawled URLs in last 7 days
grep "$(date -d '7 days ago' '+%d/%b/%Y')" googlebot.log | \
  awk '{print $7}' | sort | uniq -c | sort -rn | head 100

Key metrics to compute from logs

Metric	Formula	Target
Crawl efficiency	200 responses / total requests	>90%
Canonical ratio	canonical-URL requests / total requests	>85%
Noindex crawl waste	noindex page requests / total requests	<5%
Redirect ratio	3xx responses / total requests	<5%
Error ratio	4xx + 5xx responses / total requests	<1%

Identify unexpected Googlebot behavior

# Find URLs Googlebot crawled that are NOT in your sitemap
# (potential orphan pages or leaked URLs)
comm -23 \
  <(awk '{print $7}' googlebot.log | sed 's/?.*$//' | sort -u) \
  <(grep "<loc>" sitemap.xml | sed 's/.*<loc>//;s/<\/loc>//' | sort -u)

# Find sitemap URLs that Googlebot has NOT crawled in 30 days
# (potential crawl budget or discovery issue)
comm -23 \
  <(grep "<loc>" sitemap.xml | sed 's/.*<loc>//;s/<\/loc>//' | sort -u) \
  <(awk '{print $7}' googlebot.log | sed 's/?.*$//' | sort -u)

4. Orphan Page Detection

Orphan pages are indexed pages with no internal links pointing to them. They receive no crawl signal from the internal link graph and may drop out of the index.

Detection methods

Method 1: Crawl your site, then compare to Search Console

# Crawl site with Screaming Frog or similar to get all internally linked URLs
# Export from Search Console: all indexed URLs
# Compare:
comm -13 <(sort crawled-internal-links.txt) <(sort search-console-indexed.txt)
# Lines appearing only in Search Console = orphan candidates

Method 2: Compare sitemap to crawl

Pages in your sitemap that are not discovered via internal link crawl are likely orphaned - they exist and are submitted, but nothing links to them.

Fixing orphan pages

Priority fix order:

Add internal links from high-crawl-frequency pages (homepage, category pages)
Add to a site-wide feature (footer links, related content, breadcrumbs)
If the page has no value, remove it and 410 the URL
If the page has value but no natural fit, add to XML sitemap (minimum signal)

5. Internal Linking for Crawlability

Internal links are how crawl budget propagates through your site. Think of it like PageRank - pages with more internal links get more crawl budget allocated.

Link architecture principles

Flat is better than deep: A page 6 clicks from the homepage will be crawled much less frequently than a page 2 clicks away. Keep important pages within 3 clicks of the homepage.

Crawl depth chart:

Depth from homepage	Relative crawl frequency
1 (homepage links)	Very high
2	High
3	Medium
4	Low
5+	Very low - potential orphan risk

High-leverage internal link placements

Placement	Why effective
Homepage	Highest crawl frequency, passes max crawl signal
Global navigation	Appears on every page, strong signal
Breadcrumbs	Systematic coverage of category hierarchy
Contextual body links	Strongest quality signal (topically related)
Sitelinks / footer	Network coverage, ensures every page is reachable
"Related content" modules	Connects content clusters

Common internal linking failures

JavaScript-only links: Links that exist only in JS event handlers or are injected by JS after page load may not be followed. Use <a href=""> tags.
Nofollow internal links: rel="nofollow" on internal links wastes crawl budget allocation. Reserve nofollow for external links you don't want to vouch for.
Links in iframes: Googlebot may not follow links inside iframes.
Links behind forms or auth: Googlebot cannot submit forms or authenticate. Any page requiring form submission to reach is effectively unreachable.

6. Soft 404 Handling

A soft 404 is a page that returns HTTP 200 but contains content indicating the page does not exist (empty search results, "product not found", etc.). Google detects these and treats them poorly.

Detection

In Google Search Console, soft 404s appear under Index Coverage > "Crawled - not currently indexed" with the reason "Soft 404". Also detectable in log analysis: URLs returning 200 with very short response body size.

Fix strategies

Scenario	Fix
Out-of-stock product page	Keep page, show alternatives. Do not soft 404.
Discontinued product, no alternative	Return 404 or 410, or 301 to category
Empty search results page	Noindex via meta tag, or 200 with good content
User profile that no longer exists	Return 404
Category with no products	Either 404 or 301 to parent, not empty 200

// Next.js: return proper 404 instead of empty page
export async function generateMetadata({ params }) {
  const product = await getProduct(params.slug);
  if (!product) {
    notFound(); // triggers 404 response and not-found.tsx
  }
}

7. Index Coverage Report Interpretation

Search Console Index Coverage report categories:

Status	Meaning	Action
Valid	Indexed and serving	Monitor for unexpected drops
Valid with warning	Indexed but flagged (e.g., submitted in sitemap but canonical is different)	Fix the canonical mismatch
Excluded - Crawled, not indexed	Google crawled but chose not to index	Improve content quality or add noindex intentionally
Excluded - Discovered, not crawled	In queue but not yet crawled	Improve crawl budget or add more internal links
Excluded - Duplicate	Treated as duplicate of another URL	Check canonical chain
Excluded - Noindex	noindex directive found	Intentional? If not, remove the directive
Error - 404	Page returned 404	Remove from sitemap if intentional, fix if accidental
Error - Soft 404	200 but thin/empty content	See soft 404 section above
Error - Redirect error	Redirect loop or chain too long	Fix redirects
Error - robots.txt blocked	Blocked from crawling	Intentional? If not, update robots.txt

Monitoring workflow

Set up weekly review of:

Total indexed count trend - sudden drops signal a problem
New errors appearing in the report
"Discovered, not crawled" volume - high numbers suggest crawl budget constraints
"Excluded - Duplicate" volume - high numbers suggest canonicalization issues

rendering-strategies.md

Rendering Strategies for SEO Reference

How a page is rendered determines whether Googlebot sees your content immediately or must wait for a delayed JavaScript render. This reference covers the rendering strategies, their SEO trade-offs, framework implementations, and edge cases.

1. Rendering Strategy Comparison Matrix

Strategy	Full name	HTML on first crawl?	Build time impact	Data freshness	SEO risk
SSG	Static Site Generation	Yes - full HTML	Scales with page count	Stale until rebuild	None
SSR	Server-Side Rendering	Yes - full HTML	None	Always fresh	None
ISR	Incremental Static Regeneration	Yes (on cache hit)	Low - builds on demand	Stale until revalidation	Low - stale cache
CSR	Client-Side Rendering	No - empty shell	None	Always fresh	High for rankable pages
Hybrid SSG+CSR	Static shell, client fetches data	Partial - shell only	Low	Fresh after hydration	Medium - depends on what CSR fetches
Edge SSR	SSR at CDN edge	Yes - full HTML	None	Always fresh	None
Dynamic rendering	Serve HTML to bots, CSR to users	Yes (for Googlebot)	None	Always fresh	Medium - requires maintenance

2. Decision Framework by Content Type

Is this page behind authentication?
  YES -> CSR is fine. Google cannot and should not index it.
  NO  -> Continue...

Does this page need to rank in search results?
  NO  -> Any strategy works. Choose based on UX requirements.
  YES -> Continue...

Does the content change frequently (multiple times per day)?
  YES -> SSR or Edge SSR
  NO  -> Continue...

Is your page count large enough that SSG builds take >10 minutes?
  YES (100k+ pages) -> ISR
  NO               -> SSG (safest, simplest)

Content type recommendations

Page type	Strategy	Reasoning
Homepage	SSG	Rarely changes, highest visibility, fastest TTFB
Landing pages	SSG	Infrequent changes, need instant indexing
Blog posts / articles	SSG	Write-once, ideal for build-time generation
Documentation	SSG	Static content, version-controlled
Product detail pages (small catalog)	SSG	Fast, always-indexed
Product detail pages (large catalog)	ISR	Build-time impractical at scale
Category / listing pages	SSG or ISR	Depends on how often product mix changes
News / feed pages	SSR	Must reflect current state
User profiles (public)	SSR	Personalized, indexed, dynamic
Search results pages	SSR + canonical	Crawlable version needed
Account/dashboard pages	CSR	Not indexed, no SEO concern
Admin interfaces	CSR	Not indexed, no SEO concern

3. Framework Implementation Guide

Next.js (App Router)

// SSG: page is generated at build time (default for static data)
// File: app/blog/[slug]/page.tsx
export async function generateStaticParams() {
  const posts = await getPosts();
  return posts.map((post) => ({ slug: post.slug }));
}

export default async function BlogPost({ params }) {
  const post = await getPost(params.slug); // fetched at build time
  return <article>{post.content}</article>;
}

// ISR: regenerate on access, revalidate after N seconds
// File: app/products/[slug]/page.tsx
export const revalidate = 3600; // revalidate every hour

export default async function ProductPage({ params }) {
  const product = await getProduct(params.slug);
  return <ProductDetail product={product} />;
}

// On-demand ISR: trigger revalidation from a webhook
// File: app/api/revalidate/route.ts
import { revalidatePath } from 'next/cache';

export async function POST(request: Request) {
  const { slug } = await request.json();
  revalidatePath(`/products/${slug}`);
  return Response.json({ revalidated: true });
}

// SSR: render on every request
// File: app/news/page.tsx
export const dynamic = 'force-dynamic';

export default async function NewsPage() {
  const articles = await getLatestNews(); // fetched on every request
  return <NewsFeed articles={articles} />;
}

// Metadata for SEO - works with all rendering strategies
// File: app/products/[slug]/page.tsx
import { Metadata } from 'next';

export async function generateMetadata({ params }): Promise<Metadata> {
  const product = await getProduct(params.slug);

  return {
    title: product.name,
    description: product.description,
    alternates: {
      canonical: `https://example.com/products/${params.slug}`,
    },
    openGraph: {
      title: product.name,
      description: product.description,
      images: [product.imageUrl],
    },
  };
}

Next.js (Pages Router, legacy)

// SSG with getStaticProps
export async function getStaticProps({ params }) {
  const product = await getProduct(params.slug);
  return {
    props: { product },
    revalidate: 3600, // ISR: revalidate every hour
  };
}

// SSR with getServerSideProps
export async function getServerSideProps({ params }) {
  const news = await getLatestNews();
  return { props: { news } };
}

Nuxt 3

// SSG: generate at build time
// nuxt.config.ts
export default defineNuxtConfig({
  nitro: {
    prerender: {
      crawlLinks: true,
      routes: ['/sitemap.xml'],
    },
  },
});

// pages/products/[slug].vue
const { data: product } = await useAsyncData(
  `product-${route.params.slug}`,
  () => $fetch(`/api/products/${route.params.slug}`)
);

// ISR-equivalent with cache headers
// server/routes/products/[slug].ts
export default defineCachedEventHandler(async (event) => {
  const slug = getRouterParam(event, 'slug');
  return await getProduct(slug);
}, {
  maxAge: 60 * 60, // 1 hour
  staleMaxAge: 60 * 60 * 24, // serve stale for 24h while revalidating
});

<!-- SEO meta in Nuxt -->
<script setup>
useSeoMeta({
  title: product.name,
  description: product.description,
  ogTitle: product.name,
  ogDescription: product.description,
});

useHead({
  link: [
    { rel: 'canonical', href: `https://example.com/products/${product.slug}` }
  ]
});
</script>

Astro

Astro defaults to SSG with optional server-side rendering per route.

---
// src/pages/products/[slug].astro
// SSG: default behavior - rendered at build time
export async function getStaticPaths() {
  const products = await getProducts();
  return products.map(product => ({
    params: { slug: product.slug },
    props: { product },
  }));
}

const { product } = Astro.props;
---

<html>
  <head>
    <title>{product.name}</title>
    <meta name="description" content={product.description} />
    <link rel="canonical" href={`https://example.com/products/${product.slug}`} />
  </head>
  <body>
    <h1>{product.name}</h1>
  </body>
</html>

// astro.config.mjs - enable SSR for specific routes
export default defineConfig({
  output: 'hybrid', // SSG by default, SSR opt-in
  adapter: node({ mode: 'standalone' }),
});

---
// Enable SSR for this specific page
export const prerender = false; // override default SSG

const news = await fetch('/api/news').then(r => r.json());
---

Remix

Remix is SSR-first. All routes render on the server by default.

// app/routes/products.$slug.tsx
import { LoaderFunctionArgs, MetaFunction } from '@remix-run/node';
import { useLoaderData } from '@remix-run/react';

export async function loader({ params }: LoaderFunctionArgs) {
  const product = await getProduct(params.slug!);
  if (!product) throw new Response('Not Found', { status: 404 });
  return product;
}

export const meta: MetaFunction<typeof loader> = ({ data }) => {
  return [
    { title: data?.name },
    { name: 'description', content: data?.description },
    { tagName: 'link', rel: 'canonical', href: `https://example.com/products/${data?.slug}` },
  ];
};

export default function ProductPage() {
  const product = useLoaderData<typeof loader>();
  return <ProductDetail product={product} />;
}

For SSG-like behavior in Remix, use HTTP caching headers:

export function headers() {
  return {
    'Cache-Control': 'public, max-age=3600, s-maxage=86400, stale-while-revalidate=86400',
  };
}

4. Edge Rendering

Edge rendering runs SSR at CDN edge nodes geographically close to the user. SEO characteristics are identical to standard SSR - Googlebot receives full HTML. Benefits are performance (lower TTFB) which is a positive ranking signal.

When to use edge rendering

SSR pages where TTFB > 200ms from major geographic markets
Personalized content that must be indexed (e.g., localized pricing)
A/B testing that needs to remain invisible to Googlebot

Edge rendering in Next.js

// app/products/[slug]/page.tsx
export const runtime = 'edge';
export const dynamic = 'force-dynamic';

export default async function ProductPage({ params }) {
  const product = await getProduct(params.slug);
  return <ProductDetail product={product} />;
}

Constraints: Edge runtime does not support all Node.js APIs. Avoid database connections in edge functions; use edge-compatible APIs or KV stores instead.

5. Dynamic Rendering (Escape Hatch)

Dynamic rendering serves pre-rendered HTML to crawlers and CSR to regular users. It is an escape hatch for legacy apps that cannot easily migrate to SSR/SSG.

Implementation with a rendering proxy

# Route Googlebot and other bots to a pre-rendering service
map $http_user_agent $is_bot {
  default         0;
  ~*googlebot     1;
  ~*bingbot       1;
  ~*slurp         1;
  ~*duckduckbot   1;
}

server {
  location / {
    if ($is_bot) {
      proxy_pass https://prerender-service.example.com;
    }
    # Regular users get the CSR app
    try_files $uri $uri/ /index.html;
  }
}

Risks of dynamic rendering

Cloaking concern: Serving different HTML to bots vs users. Google explicitly permits dynamic rendering as a workaround but views SSR as the correct long-term solution. If the bot version is significantly different from the user version, it may be treated as cloaking.
Maintenance burden: Two rendering paths to maintain. The bot version can diverge from the user version over time.
Prerender service latency: Adds latency to Googlebot crawls.

Use dynamic rendering only as a transitional strategy while migrating to SSR/SSG.

6. Hydration and SEO Implications

Hydration is the process of attaching React (or other framework) event handlers to server-rendered HTML in the browser. It does not affect what Googlebot sees on the first crawl but it affects user experience metrics.

Hydration failures that affect SEO

Hydration mismatch: When the server-rendered HTML differs from what React renders on the client, React discards the server HTML and re-renders from scratch. During this process, content may disappear briefly, which can affect Core Web Vitals.

// Common cause: using browser-only APIs in server components
const [mounted, setMounted] = useState(false);
useEffect(() => setMounted(true), []);

// WRONG: conditional rendering based on mount causes hydration mismatch
return mounted ? <ActualContent /> : null;

// BETTER: render content on server, enhance on client
return (
  <div suppressHydrationWarning>
    <ActualContent />
  </div>
);

Lazy-loaded content and indexing: Content loaded via React.lazy or dynamic imports is available after the JavaScript bundle loads. For Googlebot second-wave rendering, this content is typically visible. But for pages where immediate indexing is critical, keep primary content in the initial server-rendered output.

Core Web Vitals and rendering

Metric	Affected by rendering strategy
LCP (Largest Contentful Paint)	SSG/SSR have faster LCP; CSR often has poor LCP
CLS (Cumulative Layout Shift)	Hydration mismatches cause CLS; SSG/SSR avoid this
INP (Interaction to Next Paint)	Mostly a JavaScript execution concern, not rendering strategy
TTFB (Time to First Byte)	SSG fastest (static file); SSR variable; CSR fast shell

LCP is a ranking signal. CSR pages frequently have poor LCP because the browser must download, parse, and execute JavaScript before painting the largest content element. SSG pages typically have strong LCP.

sitemaps-canonicals.md

Sitemaps and Canonical URLs Reference

Detailed implementation guide for XML sitemaps and canonical URL strategy. These two mechanisms work together to tell search engines which URLs are authoritative and how often to re-crawl them.

1. XML Sitemap Specification

Structure and limits

Constraint	Value
Max URLs per sitemap file	50,000
Max file size (uncompressed)	50 MB
Max file size (compressed, .gz)	50 MB
Max sitemaps in a sitemap index	50,000
Max total URLs across all sitemaps	2.5 billion (50,000 x 50,000)

In practice: compress all sitemaps with gzip. A 50 MB uncompressed sitemap compresses to ~5 MB and is fetched much faster.

Full URL element attributes

<url>
  <!-- Required -->
  <loc>https://example.com/products/widget</loc>

  <!-- Optional: date of last meaningful content change, ISO 8601 -->
  <lastmod>2024-01-15</lastmod>

  <!-- Optional: hint for recrawl frequency (Google largely ignores this) -->
  <changefreq>weekly</changefreq>

  <!-- Optional: relative importance 0.0-1.0 (Google largely ignores this) -->
  <priority>0.8</priority>
</url>

changefreq and priority are hints only. Google's crawl scheduling is based primarily on its own signals. Spend time on accurate lastmod instead.

lastmod accuracy rules

Always update lastmod when: Title changes, body content substantially changes, main image changes, structured data changes.

Do NOT update lastmod when: Template/CSS/JS changes that do not affect page content, sidebar changes that appear on all pages, navigation updates.

Why accuracy matters: Googlebot tracks whether your lastmod values correlate with actual content changes. If you always set lastmod to today's date, Googlebot learns to ignore it. If it accurately reflects real changes, Googlebot uses it to prioritize recrawls.

Sitemap index for large sites

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

  <!-- Group by content type for easier monitoring -->
  <sitemap>
    <loc>https://example.com/sitemaps/products-1.xml</loc>
    <lastmod>2024-01-15</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemaps/products-2.xml</loc>
    <lastmod>2024-01-15</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemaps/blog.xml</loc>
    <lastmod>2024-01-14</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemaps/categories.xml</loc>
    <lastmod>2024-01-01</lastmod>
  </sitemap>

</sitemapindex>

Group sitemaps by content type, not just by size. This lets you monitor indexing rates per content type in Search Console.

What to include (and exclude) from sitemaps

Include:

All canonical, indexable URLs (returning 200 with real content)
URLs you actively want Google to discover and crawl

Exclude:

URLs with noindex directive (contradictory signal)
URLs that redirect (include only the final destination)
Paginated pages beyond page 1 in most cases
Faceted navigation URLs you want to noindex
Duplicate content URLs (include only the canonical form)
Parameter variants (include only the clean URL)

Including non-canonical URLs in your sitemap is a strong anti-pattern. It confuses Googlebot about which version is authoritative.

Submission methods

robots.txt (recommended - always active):

Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/sitemap-index.xml

Google Search Console - Submit via Sitemaps report. Best for initial submission and monitoring indexing rates per sitemap file.
HTTP ping (legacy, still works):
```
https://www.google.com/ping?sitemap=https://example.com/sitemap.xml
```
Google deprecated this in 2023 - use Search Console instead.

Dynamic sitemap generation (Next.js example)

// app/sitemap.ts
import { MetadataRoute } from 'next';

export default async function sitemap(): Promise<MetadataRoute.Sitemap> {
  const products = await getProducts(); // fetch from CMS/DB

  const productUrls = products.map((product) => ({
    url: `https://example.com/products/${product.slug}`,
    lastModified: product.updatedAt,
    changeFrequency: 'weekly' as const,
    priority: 0.8,
  }));

  return [
    {
      url: 'https://example.com',
      lastModified: new Date(),
      changeFrequency: 'daily',
      priority: 1,
    },
    ...productUrls,
  ];
}

For very large catalogs (100k+ pages), generate sitemaps statically during build or on a scheduled job rather than dynamically on each request. Dynamic generation under load can become a bottleneck.

2. Canonical URL Rules

The canonical URL is the "master" URL for a piece of content. All ranking signals (links, crawl frequency, page experience data) accumulate on the canonical URL.

The canonical signal hierarchy

Google uses multiple signals to determine the canonical. In order of weight:

301 redirect - Strongest signal. URL A redirects to URL B = B is canonical.
<link rel="canonical"> tag - Strong hint. Not a directive - Google may override it if the canonical seems wrong.
Sitemap inclusion - Weak hint. Being in the sitemap suggests a URL is canonical.
Internal links - Weak hint. The URL you link to most often is treated as canonical.

If these signals conflict, Google picks what it considers the most reliable signal, which may differ from your intent.

Self-referencing canonicals

Every page should declare a self-referencing canonical, even if there is no duplicate content risk. This is defensive practice:

<!-- On https://example.com/products/widget -->
<link rel="canonical" href="https://example.com/products/widget" />

<!-- On https://example.com/blog/post-title -->
<link rel="canonical" href="https://example.com/blog/post-title" />

URL variants that require canonical handling

Every site has implicit URL variants. You must pick one form and consistently canonicalize all others to it:

Variant type	Examples	Pick one form
Protocol	`http://` vs `https://`	Always HTTPS
Subdomain	`www.example.com` vs `example.com`	Pick one, 301 the other
Trailing slash	`/products/widget/` vs `/products/widget`	Pick one, be consistent
Query parameters	`?ref=footer` vs clean URL	Clean URL is canonical
Capitalization	`/Products/Widget` vs `/products/widget`	Lowercase, enforce at server
Index files	`/about/index.html` vs `/about/`	Remove index.html from URLs
Fragment	`/page#section` vs `/page`	Fragment is same canonical as `/page`

Cross-domain canonicals

For content syndicated or duplicated across domains:

<!-- On https://partner.com/article/original-content -->
<!-- If the original lives on example.com, declare: -->
<link rel="canonical" href="https://example.com/article/original-content" />

Cross-domain canonicals tell Google to attribute all signals to the original domain. Use this for:

Syndicated content (press releases, guest posts)
AMP pages canonicaling to regular pages
Country-specific domains that share content with the main domain

Common canonical mistakes

Canonical chain: A -> B -> C

<!-- Page A -->
<link rel="canonical" href="https://example.com/page-b" />

<!-- Page B -->
<link rel="canonical" href="https://example.com/page-c" />

Google may follow the chain or stop at B. Always canonical directly to the final URL.

Canonical to a redirect:

<!-- Canonicaling to a URL that itself 301s to somewhere else -->
<link rel="canonical" href="https://example.com/old-page" />
<!-- old-page 301s to /new-page -->

The canonical should point to the final destination, not an intermediate redirect.

Canonical to a noindex page:

<!-- Canonicaling to a page that has noindex -->
<link rel="canonical" href="https://example.com/noindex-page" />

Contradictory signals. If the canonical is noindexed, the current page will also not be indexed.

Inconsistent canonical with 301:

<!-- URL A 301 redirects to URL C -->
<!-- But URL B has <link rel="canonical" href="URL A"> -->
<!-- Google sees: B -> canonical -> A -> redirect -> C -->

Always align redirects and canonical tags to point to the same final URL.

3. Pagination Handling

Modern approach (preferred): Self-canonical per page

Each paginated page is canonicalized to itself. This is the current Google recommendation. The deprecated rel="prev" and rel="next" pattern is no longer officially supported.

<!-- /products?page=1 -->
<link rel="canonical" href="https://example.com/products" />

<!-- /products?page=2 -->
<link rel="canonical" href="https://example.com/products?page=2" />

<!-- /products?page=3 -->
<link rel="canonical" href="https://example.com/products?page=3" />

Only canonical page 2+ to page 1 if they contain identical or near-identical content. If each page has 20 unique products, each page has distinct value and should be self-canonical.

When to noindex paginated pages

Noindex paginated pages when:

Pages beyond 2-3 receive no organic traffic
Content is highly similar across pages
You need to conserve crawl budget

Do NOT noindex page 1 or pages that rank for head terms.

4. hreflang and Canonicals Interaction

hreflang tells Google which language/region variant to serve to which users. It interacts with canonicals and must be consistent.

Correct hreflang setup

<!-- On https://example.com/en/products/widget (English, US) -->
<link rel="alternate" hreflang="en-US" href="https://example.com/en/products/widget" />
<link rel="alternate" hreflang="en-GB" href="https://example.com/en-gb/products/widget" />
<link rel="alternate" hreflang="de" href="https://example.com/de/products/widget" />
<link rel="alternate" hreflang="x-default" href="https://example.com/products/widget" />
<link rel="canonical" href="https://example.com/en/products/widget" />

Rules for hreflang:

All language variants must reference all other language variants (reciprocal)
hreflang does not change the canonical - each language page canonicals to itself
Do not canonical an FR page to an EN page (that removes the FR page from index)
x-default is the fallback for users who don't match any specific language

Common hreflang failure

<!-- WRONG: FR page canonicaling to EN canonical -->
<!-- On /fr/products/widget: -->
<link rel="canonical" href="https://example.com/en/products/widget" />
<link rel="alternate" hreflang="fr" href="https://example.com/fr/products/widget" />

This tells Google "this FR page is a duplicate of the EN page" and the FR page will not be indexed for French users. Each locale page should be self-canonical.

hreflang in XML sitemaps

For large multilingual sites, managing hreflang in sitemaps is more maintainable than per-page HTML tags:

<url>
  <loc>https://example.com/en/products/widget</loc>
  <xhtml:link rel="alternate" hreflang="en-US" href="https://example.com/en/products/widget"/>
  <xhtml:link rel="alternate" hreflang="de" href="https://example.com/de/products/widget"/>
  <xhtml:link rel="alternate" hreflang="x-default" href="https://example.com/products/widget"/>
</url>

Add xmlns:xhtml="http://www.w3.org/1999/xhtml" to the <urlset> opening tag.

5. Canonical Validation Checklist

Before deploying canonical changes:

All canonical tags point to absolute URLs (including protocol)
Canonical target returns HTTP 200
Canonical target is not itself noindexed
Canonical target does not redirect elsewhere
Canonical tag is in the <head>, not <body>
Only one canonical tag per page (duplicate canonicals are ignored)
Canonical URL form is consistent with your chosen URL standard (www/non-www, trailing slash)
Sitemap URLs match their declared canonical URLs exactly
hreflang alternate URLs match the canonical URL for that locale (not a different locale's canonical)
No canonical loops (A -> B -> A)

Frequently Asked Questions

What is technical-seo?

How do I install technical-seo?

Run npx skills add AbsolutelySkilled/AbsolutelySkilled --skill technical-seo in your terminal. The skill will be immediately available in your AI coding agent.

What AI agents support technical-seo?

technical-seo works with claude-code, gemini-cli, openai-codex, mcp. Install it once and use it across any supported AI coding agent.

technical-seo

What is technical-seo?

technical-seo

Quick Facts

How to Install

Overview

Tags

Platforms

Related Skills

Frequently Asked Questions

What is technical-seo?

How do I install technical-seo?

What AI agents support technical-seo?

Maintainers

SKILL.md

Technical SEO

When to use this skill

Key principles

Core concepts

The crawl-index-rank pipeline

Crawl budget

Rendering for crawlers

URL parameter handling

Mobile-first indexing

Common tasks

Configure robots.txt

Generate an XML sitemap

Set up canonical URLs

Choose a rendering strategy

Fix redirect chains

Handle URL parameters for faceted navigation

Set up meta robots directives

Debug indexing issues

Anti-patterns / common mistakes

Gotchas

References

References

crawlability-indexing.md

Crawlability and Indexing Reference

1. Googlebot Behavior

How Googlebot discovers URLs

Crawl frequency signals

Crawl rate vs crawl demand

2. Crawl Budget Optimization

Identify URL budget waste

Crawl budget audit checklist

Prioritizing pages for crawl

3. Log File Analysis

Extract Googlebot requests from logs

Key metrics to compute from logs

Identify unexpected Googlebot behavior

4. Orphan Page Detection

Detection methods

Fixing orphan pages

5. Internal Linking for Crawlability

Link architecture principles

High-leverage internal link placements

Common internal linking failures

6. Soft 404 Handling

Detection

Fix strategies

7. Index Coverage Report Interpretation

Monitoring workflow

rendering-strategies.md

Rendering Strategies for SEO Reference

1. Rendering Strategy Comparison Matrix

2. Decision Framework by Content Type

Content type recommendations

3. Framework Implementation Guide

Next.js (App Router)

Next.js (Pages Router, legacy)

Nuxt 3

Astro

Remix

4. Edge Rendering

When to use edge rendering

Edge rendering in Next.js

5. Dynamic Rendering (Escape Hatch)

Implementation with a rendering proxy

Risks of dynamic rendering