technical-seo
Use this skill when working on technical SEO infrastructure - crawlability, indexing, XML sitemaps, canonical URLs, robots.txt, redirect chains, rendering strategies (SSR/SSG/ISR/CSR), crawl budget optimization, and search engine rendering. Triggers on fixing indexing issues, configuring crawl directives, choosing rendering strategies for SEO, debugging Google Search Console errors, or auditing site architecture for search engines.
marketing seotechnical-seocrawlabilitysitemapscanonicalsrenderingindexingWhat is technical-seo?
Use this skill when working on technical SEO infrastructure - crawlability, indexing, XML sitemaps, canonical URLs, robots.txt, redirect chains, rendering strategies (SSR/SSG/ISR/CSR), crawl budget optimization, and search engine rendering. Triggers on fixing indexing issues, configuring crawl directives, choosing rendering strategies for SEO, debugging Google Search Console errors, or auditing site architecture for search engines.
technical-seo
technical-seo is a production-ready AI agent skill for claude-code, gemini-cli, openai-codex, and 1 more. Working on technical SEO infrastructure - crawlability, indexing, XML sitemaps, canonical URLs, robots.txt, redirect chains, rendering strategies (SSR/SSG/ISR/CSR), crawl budget optimization, and search engine rendering.
Quick Facts
| Field | Value |
|---|---|
| Category | marketing |
| Version | 0.1.0 |
| Platforms | claude-code, gemini-cli, openai-codex, mcp |
| License | MIT |
How to Install
- Make sure you have Node.js installed on your machine.
- Run the following command in your terminal:
npx skills add AbsolutelySkilled/AbsolutelySkilled --skill technical-seo- The technical-seo skill is now available in your AI coding agent (Claude Code, Gemini CLI, OpenAI Codex, etc.).
Overview
The infrastructure layer of SEO. Technical SEO ensures search engines can discover, crawl, render, and index your pages. It is the foundation - if crawling fails, content quality and link building are irrelevant. This skill covers the crawl-index-rank pipeline and the engineering decisions that make or break search visibility.
Tags
seo technical-seo crawlability sitemaps canonicals rendering indexing
Platforms
- claude-code
- gemini-cli
- openai-codex
- mcp
Related Skills
Pair technical-seo with these complementary skills:
Frequently Asked Questions
What is technical-seo?
Use this skill when working on technical SEO infrastructure - crawlability, indexing, XML sitemaps, canonical URLs, robots.txt, redirect chains, rendering strategies (SSR/SSG/ISR/CSR), crawl budget optimization, and search engine rendering. Triggers on fixing indexing issues, configuring crawl directives, choosing rendering strategies for SEO, debugging Google Search Console errors, or auditing site architecture for search engines.
How do I install technical-seo?
Run npx skills add AbsolutelySkilled/AbsolutelySkilled --skill technical-seo in your terminal. The skill will be immediately available in your AI coding agent.
What AI agents support technical-seo?
This skill works with claude-code, gemini-cli, openai-codex, mcp. Install it once and use it across any supported AI coding agent.
Maintainers
Generated from AbsolutelySkilled
SKILL.md
Technical SEO
The infrastructure layer of SEO. Technical SEO ensures search engines can discover, crawl, render, and index your pages. It is the foundation - if crawling fails, content quality and link building are irrelevant. This skill covers the crawl-index-rank pipeline and the engineering decisions that make or break search visibility.
When to use this skill
Trigger this skill when the user:
- Reports pages not showing in Google Search or Index Coverage errors in Search Console
- Needs to configure or debug
robots.txtdirectives - Wants to generate or fix an XML sitemap
- Is setting up canonical URLs or resolving duplicate content issues
- Has redirect chains or wants to audit redirects
- Is choosing a rendering strategy (SSR, SSG, ISR, CSR) with SEO as a constraint
- Is debugging why Googlebot cannot see content that users can
- Wants to optimize crawl budget on a large site (10k+ pages)
Do NOT trigger this skill for:
- Content strategy, editorial calendars, or keyword research
- Link building, backlink analysis, or off-page SEO
Key principles
Crawlable before rankable - A page that Googlebot cannot reach cannot rank. Discovery is step one in the pipeline. Fix crawl and index issues before any other SEO work. Crawlability is a precondition, not a ranking factor.
One canonical URL per piece of content - Every distinct piece of content must have exactly one URL that all signals consolidate on. HTTP vs HTTPS, www vs non-www, trailing slash vs none, query parameters - each variant dilutes ranking signals unless canonicalized to a single source of truth.
Rendering strategy is an SEO architecture decision - Whether your page is rendered at build time (SSG), at request time on the server (SSR), or in the browser (CSR) determines whether Googlebot sees your content on the first crawl or must wait for a second-wave JavaScript render. Make this decision deliberately.
robots.txt blocks crawling, not indexing - A page blocked in
robots.txtcan still be indexed if other pages link to it. Googlebot sees the URL via links but cannot read the content, so it may index a thin or empty page. Usenoindexin the HTTP response header or meta tag to prevent indexing, notrobots.txt.Redirect chains waste crawl budget and dilute link equity - Each hop in a redirect chain costs crawl budget and reduces the link equity passed through. Keep all redirects as single-hop 301s from old URL directly to final destination.
Core concepts
The crawl-index-rank pipeline
Three sequential phases - failure in any phase stops everything downstream:
| Phase | What happens | Common failure modes |
|---|---|---|
| Crawl | Googlebot discovers and fetches the URL | robots.txt block, slow server, crawl budget exhausted |
| Index | Google processes and stores the page | noindex directive, duplicate content, thin content, render failure |
| Rank | Google assigns position for queries | Content quality, E-E-A-T, links, page experience |
Crawl budget
Crawl budget is the number of URLs Googlebot will crawl on your site within a given timeframe. It is a product of crawl rate (how fast Googlebot can crawl without overloading the server) and crawl demand (how much Google wants to crawl based on page value and freshness).
Who needs to care about crawl budget:
- Sites with 10k+ pages
- Sites with large faceted navigation generating URL permutations
- Sites with many low-value or duplicate URLs (pagination, filters, sessions in URLs)
- Sites with frequent content updates that need fast re-indexing
Small sites (<1k pages) with clean architecture rarely face crawl budget problems.
Rendering for crawlers
Googlebot can execute JavaScript but does so in a second wave, sometimes days after the initial crawl. Content invisible without JavaScript is at risk:
| Rendering | Googlebot sees on first crawl | SEO risk |
|---|---|---|
| SSG (static) | Full HTML | None |
| SSR (server-side) | Full HTML | None |
| ISR (incremental static) | Full HTML (on cache hit) | Minor - stale cache shows old content |
| CSR (client-side only) | Empty shell | High - content may not be indexed |
URL parameter handling
URL parameters are a major source of duplicate content. Common problematic patterns:
- Tracking parameters:
?utm_source=email&utm_campaign=launch - Faceted navigation:
?color=red&size=M&sort=price - Session IDs:
?sessionid=abc123 - Pagination:
?page=2
Handle with: canonical tags pointing to the clean URL, robots.txt Disallow for
pure tracking parameters, or Google Search Console parameter handling.
Mobile-first indexing
Google indexes and ranks primarily based on the mobile version of your content. Ensure the mobile version has: the same content as desktop, the same structured data, and equivalent meta tags. Blocked mobile CSS/JS is a common cause of mobile-first indexing failures.
Common tasks
Configure robots.txt
# Allow all crawlers to access all content (default, no file needed)
User-agent: *
Allow: /
# Block specific directories from all crawlers
User-agent: *
Disallow: /admin/
Disallow: /internal-search/
Disallow: /checkout/
Disallow: /?*sessionid= # block session ID URLs
# Allow Googlebot to crawl CSS and JS (critical - never block these)
User-agent: Googlebot
Allow: /*.js$
Allow: /*.css$
# Point to sitemap
Sitemap: https://example.com/sitemap.xmlNever disallow CSS or JS. Googlebot needs them to render your pages. Blocking them degrades rendering quality and can hurt rankings.
Generate an XML sitemap
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/</loc>
<lastmod>2024-01-15</lastmod>
<changefreq>weekly</changefreq>
<priority>1.0</priority>
</url>
<url>
<loc>https://example.com/products/widget</loc>
<lastmod>2024-01-10</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
</urlset>For large sites, use a sitemap index:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemaps/products.xml</loc>
<lastmod>2024-01-15</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemaps/blog.xml</loc>
<lastmod>2024-01-15</lastmod>
</sitemap>
</sitemapindex>Sitemap rules: max 50,000 URLs per file, max 50MB uncompressed. Only include
canonical, indexable URLs. Only include lastmod if it reflects genuine content
changes - Googlebot learns to ignore dishonest lastmod values.
Set up canonical URLs
In the <head> element:
<link rel="canonical" href="https://example.com/products/widget" />Handle all URL variants consistently:
<!-- All of these should resolve to one canonical form -->
<!-- https://example.com/products/widget/ -->
<!-- https://example.com/products/widget -->
<!-- http://example.com/products/widget -->
<!-- https://www.example.com/products/widget -->
<!-- All pages declare the same canonical -->
<link rel="canonical" href="https://example.com/products/widget" />For paginated pages, each page is canonically itself (do not canonical page 2 to page 1 unless they have identical content):
<!-- Page 1 -->
<link rel="canonical" href="https://example.com/blog" />
<!-- Page 2 -->
<link rel="canonical" href="https://example.com/blog?page=2" />Choose a rendering strategy
Decision table for ranking pages (pages you want to appear in search):
| Content type | Recommended strategy | Rationale |
|---|---|---|
| Marketing pages, landing pages | SSG | Crawled immediately, fast TTFB |
| Blog posts, documentation | SSG | Rarely changes, build on publish |
| Product pages (10k-100k) | ISR | Manageable builds, auto-updates |
| User profiles, social content | SSR | Personalized but crawlable |
| Search results, filters | SSR + canonical | Crawlable canonical version |
| Dashboards, account pages | CSR is fine | Behind auth, not indexed anyway |
For Next.js:
// SSG - crawled immediately, best for ranking pages
export async function generateStaticParams() { ... }
// ISR - rebuilds on demand, good for large catalogs
export const revalidate = 3600; // revalidate every hour
// SSR - server renders on every request
export const dynamic = 'force-dynamic';Fix redirect chains
Redirect chains occur when A -> B -> C instead of A -> C directly. Detect and fix:
# Detect redirect chain depth with curl
curl -L -o /dev/null -s -w "%{url_effective} hops: %{num_redirects}\n" \
https://example.com/old-page
# Follow the chain step by step
curl -I https://example.com/old-page
# Note Location header, then:
curl -I https://example.com/intermediate-pageFix by updating the origin redirect to point directly to the final URL:
# Before: /old-page -> /intermediate -> /final-page (chain)
# After: /old-page -> /final-page (single hop)
rewrite ^/old-page$ /final-page permanent;Rules:
- 301 = permanent redirect (passes link equity, cached by browsers)
- 302 = temporary redirect (does not pass full link equity, not cached)
- Use 301 for SEO unless the redirect is genuinely temporary
- Client-side redirects (
window.location, meta refresh) do not reliably pass link equity. Always redirect at the server or CDN layer.
Handle URL parameters for faceted navigation
Faceted navigation generates an exponential number of URL combinations. Choose one:
Option A: Canonical to the base category page (simplest)
<!-- /products?color=red&size=M&sort=price -->
<link rel="canonical" href="https://example.com/products" />Option B: robots.txt disallow parameter combinations
User-agent: *
Disallow: /*?*color=
Disallow: /*?*size=
Disallow: /*?*sort=Option C: Noindex on parameterized pages
<meta name="robots" content="noindex, follow" />Option A is preferred when the canonical page has good content. Option B is useful when you want to conserve crawl budget. Option C is the fallback when you need to serve the page to users but not have it indexed.
Set up meta robots directives
In the HTML <head>:
<!-- Default: crawl and index (no tag needed) -->
<meta name="robots" content="index, follow" />
<!-- Do not index, but follow links on this page -->
<meta name="robots" content="noindex, follow" />
<!-- Do not index, do not follow links -->
<meta name="robots" content="noindex, nofollow" />
<!-- Prevent Google from showing a cached version -->
<meta name="robots" content="index, follow, noarchive" />Via HTTP response header (works for non-HTML resources like PDFs):
X-Robots-Tag: noindex
X-Robots-Tag: noindex, nofollowDebug indexing issues
When a page is not indexed, work through this checklist in order:
- URL Inspection tool in Search Console - checks crawl status, last crawl, indexing decision, and renders a screenshot of what Googlebot sees
- robots.txt tester - confirm the URL is not blocked
- Live URL test - request indexing and see if Googlebot can render the page
- Check for noindex - view source and search for
noindex, check HTTP headers - Check canonical - is the canonical pointing to a different URL?
- Check content - is there enough unique, substantive content?
- Check internal links - is the page linked from anywhere Googlebot can reach?
Anti-patterns / common mistakes
| Mistake | Why it is wrong | What to do instead |
|---|---|---|
| Blocking CSS/JS in robots.txt | Googlebot cannot render pages, sees empty shells | Allow: /*.js$ and Allow: /*.css$ explicitly |
Dishonest lastmod in sitemap |
Googlebot learns to ignore it; all URLs get low-priority crawls | Only update lastmod on genuine content changes |
| CSR-only rendering for rankable pages | Content in JS is not seen on first crawl; delayed or failed indexing | Use SSG or SSR for any page you want in search results |
| Client-side redirects for SEO | Meta refresh and JS redirects do not reliably pass link equity | Redirect at server/CDN level with 301 |
| Using robots.txt to prevent indexing | Blocked pages can still be indexed as empty/thin if linked to | Use noindex directive in response headers or meta tag |
| Self-referential canonical loops | Page A canonicals to B, B canonicals to A; Google ignores both | Each URL canonicals to a single definitive URL |
| Duplicate canonicals pointing to 404s | Signals to Google the canonical URL is invalid | Ensure canonical targets return 200 with real content |
| Trailing slash inconsistency | Two URLs for every page, dilutes crawl budget and link signals | Enforce one form at the server, canonical the other |
| Noindex on paginated pages in series | First page gets indexed without context of full series | Only noindex pagination if pages are truly thin/duplicate |
| Sitemap URLs not matching canonicals | Confuses Googlebot about which URL is authoritative | Sitemap URLs must exactly match their canonical <link> tag |
Gotchas
Canonical tags are a hint, not a directive - Google can and does override canonical tags if it disagrees with your signal. If your site serves near-identical content at two URLs and one has more internal links, Google may index the more-linked URL regardless of your canonical. Canonical must be reinforced with consistent internal linking and redirects.
Blocking CSS/JS in robots.txt causes Googlebot to see a broken page - Even a single blocked CSS file can prevent Googlebot from rendering a page correctly, leading to indexing of an empty or unstyled shell. Always test with the URL Inspection tool's "Test Live URL" option to confirm Googlebot's rendered view.
noindex in robots.txt does not work - The
noindexdirective in robots.txt is not a valid robots.txt directive; it is ignored. The only validnoindexplacement is in the HTTP response header (X-Robots-Tag) or the HTML<meta name="robots">tag on the page itself.Hreflang is validated in both directions - If page A in English points to page B in French, page B must also point back to page A. Missing the return link causes Google to ignore the entire hreflang cluster. Validate every hreflang implementation bidirectionally.
Sitemap
lastmoddates that never change train Googlebot to deprioritize your site - Many CMSs emit the current date aslastmodon every page regardless of actual changes. Googlebot learns these dates are dishonest and reduces crawl frequency. Only emitlastmodwhen content genuinely changed.
References
For detailed implementation guidance, load the relevant reference file:
references/crawlability-indexing.md- crawl budget optimization, Googlebot behavior, log analysis, orphan pages, internal linking for crawlabilityreferences/sitemaps-canonicals.md- XML sitemap spec details, canonical URL rules, hreflang interaction, pagination handlingreferences/rendering-strategies.md- SSG/SSR/ISR/CSR comparison, framework implementations (Next.js, Nuxt, Astro, Remix), edge rendering, dynamic rendering
Only load a reference file if the current task requires it - they are long and will consume context.
References
crawlability-indexing.md
Crawlability and Indexing Reference
Deep reference for ensuring Googlebot can discover, crawl, and index your pages efficiently. Covers crawl budget mechanics, log file analysis, orphan page detection, internal linking strategy, and the index coverage pipeline.
1. Googlebot Behavior
How Googlebot discovers URLs
- Sitemap submission via Search Console or
Sitemap:directive in robots.txt - Following links from already-crawled pages (internal and external)
- Submitted URLs via URL Inspection tool in Search Console
- DNS records and link graphs maintained internally
Googlebot does not crawl pages it cannot reach via links or sitemaps. An isolated page - reachable only via direct URL entry - is an orphan and will likely not be discovered or re-crawled.
Crawl frequency signals
Googlebot adjusts crawl frequency based on:
| Signal | Effect |
|---|---|
| High-quality inbound links to the page | Increases crawl priority |
Accurate lastmod in sitemap |
Influences recrawl scheduling |
| Fast server response time | Allows more crawls within same rate limit |
| Consistent 200 status (not flapping) | Builds crawl confidence |
| Fresh content on recrawl | Increases future crawl frequency |
| 5xx errors on crawl | Reduces crawl rate; Google backs off |
Crawl rate vs crawl demand
Crawl rate: How fast Googlebot is allowed to crawl. Controlled by server response speed and the crawl rate setting (adjustable in Search Console, though rarely needed). Google auto-adjusts to avoid overloading your server.
Crawl demand: How much Google wants to crawl your site. Driven by page popularity, freshness signals, and link graph centrality. You influence demand by improving content quality and earning links.
Crawl budget = effective crawls per day = min(crawl rate, crawl demand).
You cannot directly set crawl budget. You influence it by:
- Removing low-value URLs from the crawl frontier
- Improving server response speed
- Ensuring crawled pages return 200 with real content
- Building internal links to important pages
2. Crawl Budget Optimization
Identify URL budget waste
Common sources of wasted crawl budget:
| Source | Volume | Fix |
|---|---|---|
| Faceted navigation permutations | Can be millions | Canonicals + robots.txt disallow |
| Paginated URLs beyond page 2-3 | Grows with content | Noindex or remove from sitemap |
| Session IDs in URLs | Per-visitor | Block in robots.txt |
| Tracking parameters | Per-campaign | Canonical to clean URL |
| Duplicate content with/without trailing slash | 2x all URLs | 301 redirect to canonical form |
| Legacy redirect targets still in sitemap | Each wastes a hop | Update sitemap to final URLs |
| Soft 404 pages | Crawled but wasted | Fix to return 404/410 or real content |
| Infinite scroll / calendar archives | Unbounded URL space | Pagination with disallow or noindex |
Crawl budget audit checklist
1. Pull log file data (see section 3 below)
2. Count unique URLs Googlebot fetched in the last 30 days
3. What % returned 200? Non-200 is wasted budget.
4. What % are canonical? Non-canonical crawls waste budget.
5. What % have noindex? These should ideally not be crawled.
6. Identify top 20 URL templates by volume - are these all valuable?
7. Check robots.txt is not accidentally allowing junk URLs
8. Check sitemap only contains canonical 200 pagesPrioritizing pages for crawl
Internal linking is the most powerful lever for influencing crawl priority. Pages with more internal links get crawled more often. Structure your internal linking to concentrate crawls on:
- High-value landing pages (highest revenue/conversion intent)
- Recently updated content (freshness signals)
- Newly published content (discovery)
- Deep pages in large catalogs
Deprioritize by removing internal links from:
- Thin pages with low-value content
- Near-duplicate pages
- Paginated pages beyond first 1-2 pages
3. Log File Analysis
Server access logs are the ground truth for what Googlebot actually crawled. More reliable than Search Console for diagnosing crawl issues.
Extract Googlebot requests from logs
# Apache/Nginx combined log format
grep "Googlebot" /var/log/nginx/access.log | \
grep -v "AdsBot\|APIs-Google\|Mediapartners" > googlebot.log
# Count requests by URL pattern
awk '{print $7}' googlebot.log | \
sed 's/?.*$//' | \
sort | uniq -c | sort -rn | head 50
# Count by status code
awk '{print $9}' googlebot.log | sort | uniq -c | sort -rn
# Find most crawled URLs in last 7 days
grep "$(date -d '7 days ago' '+%d/%b/%Y')" googlebot.log | \
awk '{print $7}' | sort | uniq -c | sort -rn | head 100Key metrics to compute from logs
| Metric | Formula | Target |
|---|---|---|
| Crawl efficiency | 200 responses / total requests | >90% |
| Canonical ratio | canonical-URL requests / total requests | >85% |
| Noindex crawl waste | noindex page requests / total requests | <5% |
| Redirect ratio | 3xx responses / total requests | <5% |
| Error ratio | 4xx + 5xx responses / total requests | <1% |
Identify unexpected Googlebot behavior
# Find URLs Googlebot crawled that are NOT in your sitemap
# (potential orphan pages or leaked URLs)
comm -23 \
<(awk '{print $7}' googlebot.log | sed 's/?.*$//' | sort -u) \
<(grep "<loc>" sitemap.xml | sed 's/.*<loc>//;s/<\/loc>//' | sort -u)
# Find sitemap URLs that Googlebot has NOT crawled in 30 days
# (potential crawl budget or discovery issue)
comm -23 \
<(grep "<loc>" sitemap.xml | sed 's/.*<loc>//;s/<\/loc>//' | sort -u) \
<(awk '{print $7}' googlebot.log | sed 's/?.*$//' | sort -u)4. Orphan Page Detection
Orphan pages are indexed pages with no internal links pointing to them. They receive no crawl signal from the internal link graph and may drop out of the index.
Detection methods
Method 1: Crawl your site, then compare to Search Console
# Crawl site with Screaming Frog or similar to get all internally linked URLs
# Export from Search Console: all indexed URLs
# Compare:
comm -13 <(sort crawled-internal-links.txt) <(sort search-console-indexed.txt)
# Lines appearing only in Search Console = orphan candidatesMethod 2: Compare sitemap to crawl
Pages in your sitemap that are not discovered via internal link crawl are likely orphaned - they exist and are submitted, but nothing links to them.
Fixing orphan pages
Priority fix order:
- Add internal links from high-crawl-frequency pages (homepage, category pages)
- Add to a site-wide feature (footer links, related content, breadcrumbs)
- If the page has no value, remove it and 410 the URL
- If the page has value but no natural fit, add to XML sitemap (minimum signal)
5. Internal Linking for Crawlability
Internal links are how crawl budget propagates through your site. Think of it like PageRank - pages with more internal links get more crawl budget allocated.
Link architecture principles
Flat is better than deep: A page 6 clicks from the homepage will be crawled much less frequently than a page 2 clicks away. Keep important pages within 3 clicks of the homepage.
Crawl depth chart:
| Depth from homepage | Relative crawl frequency |
|---|---|
| 1 (homepage links) | Very high |
| 2 | High |
| 3 | Medium |
| 4 | Low |
| 5+ | Very low - potential orphan risk |
High-leverage internal link placements
| Placement | Why effective |
|---|---|
| Homepage | Highest crawl frequency, passes max crawl signal |
| Global navigation | Appears on every page, strong signal |
| Breadcrumbs | Systematic coverage of category hierarchy |
| Contextual body links | Strongest quality signal (topically related) |
| Sitelinks / footer | Network coverage, ensures every page is reachable |
| "Related content" modules | Connects content clusters |
Common internal linking failures
- JavaScript-only links: Links that exist only in JS event handlers or are
injected by JS after page load may not be followed. Use
<a href="">tags. - Nofollow internal links:
rel="nofollow"on internal links wastes crawl budget allocation. Reserve nofollow for external links you don't want to vouch for. - Links in iframes: Googlebot may not follow links inside iframes.
- Links behind forms or auth: Googlebot cannot submit forms or authenticate. Any page requiring form submission to reach is effectively unreachable.
6. Soft 404 Handling
A soft 404 is a page that returns HTTP 200 but contains content indicating the page does not exist (empty search results, "product not found", etc.). Google detects these and treats them poorly.
Detection
In Google Search Console, soft 404s appear under Index Coverage > "Crawled - not currently indexed" with the reason "Soft 404". Also detectable in log analysis: URLs returning 200 with very short response body size.
Fix strategies
| Scenario | Fix |
|---|---|
| Out-of-stock product page | Keep page, show alternatives. Do not soft 404. |
| Discontinued product, no alternative | Return 404 or 410, or 301 to category |
| Empty search results page | Noindex via meta tag, or 200 with good content |
| User profile that no longer exists | Return 404 |
| Category with no products | Either 404 or 301 to parent, not empty 200 |
// Next.js: return proper 404 instead of empty page
export async function generateMetadata({ params }) {
const product = await getProduct(params.slug);
if (!product) {
notFound(); // triggers 404 response and not-found.tsx
}
}7. Index Coverage Report Interpretation
Search Console Index Coverage report categories:
| Status | Meaning | Action |
|---|---|---|
| Valid | Indexed and serving | Monitor for unexpected drops |
| Valid with warning | Indexed but flagged (e.g., submitted in sitemap but canonical is different) | Fix the canonical mismatch |
| Excluded - Crawled, not indexed | Google crawled but chose not to index | Improve content quality or add noindex intentionally |
| Excluded - Discovered, not crawled | In queue but not yet crawled | Improve crawl budget or add more internal links |
| Excluded - Duplicate | Treated as duplicate of another URL | Check canonical chain |
| Excluded - Noindex | noindex directive found | Intentional? If not, remove the directive |
| Error - 404 | Page returned 404 | Remove from sitemap if intentional, fix if accidental |
| Error - Soft 404 | 200 but thin/empty content | See soft 404 section above |
| Error - Redirect error | Redirect loop or chain too long | Fix redirects |
| Error - robots.txt blocked | Blocked from crawling | Intentional? If not, update robots.txt |
Monitoring workflow
Set up weekly review of:
- Total indexed count trend - sudden drops signal a problem
- New errors appearing in the report
- "Discovered, not crawled" volume - high numbers suggest crawl budget constraints
- "Excluded - Duplicate" volume - high numbers suggest canonicalization issues
rendering-strategies.md
Rendering Strategies for SEO Reference
How a page is rendered determines whether Googlebot sees your content immediately or must wait for a delayed JavaScript render. This reference covers the rendering strategies, their SEO trade-offs, framework implementations, and edge cases.
1. Rendering Strategy Comparison Matrix
| Strategy | Full name | HTML on first crawl? | Build time impact | Data freshness | SEO risk |
|---|---|---|---|---|---|
| SSG | Static Site Generation | Yes - full HTML | Scales with page count | Stale until rebuild | None |
| SSR | Server-Side Rendering | Yes - full HTML | None | Always fresh | None |
| ISR | Incremental Static Regeneration | Yes (on cache hit) | Low - builds on demand | Stale until revalidation | Low - stale cache |
| CSR | Client-Side Rendering | No - empty shell | None | Always fresh | High for rankable pages |
| Hybrid SSG+CSR | Static shell, client fetches data | Partial - shell only | Low | Fresh after hydration | Medium - depends on what CSR fetches |
| Edge SSR | SSR at CDN edge | Yes - full HTML | None | Always fresh | None |
| Dynamic rendering | Serve HTML to bots, CSR to users | Yes (for Googlebot) | None | Always fresh | Medium - requires maintenance |
2. Decision Framework by Content Type
Is this page behind authentication?
YES -> CSR is fine. Google cannot and should not index it.
NO -> Continue...
Does this page need to rank in search results?
NO -> Any strategy works. Choose based on UX requirements.
YES -> Continue...
Does the content change frequently (multiple times per day)?
YES -> SSR or Edge SSR
NO -> Continue...
Is your page count large enough that SSG builds take >10 minutes?
YES (100k+ pages) -> ISR
NO -> SSG (safest, simplest)Content type recommendations
| Page type | Strategy | Reasoning |
|---|---|---|
| Homepage | SSG | Rarely changes, highest visibility, fastest TTFB |
| Landing pages | SSG | Infrequent changes, need instant indexing |
| Blog posts / articles | SSG | Write-once, ideal for build-time generation |
| Documentation | SSG | Static content, version-controlled |
| Product detail pages (small catalog) | SSG | Fast, always-indexed |
| Product detail pages (large catalog) | ISR | Build-time impractical at scale |
| Category / listing pages | SSG or ISR | Depends on how often product mix changes |
| News / feed pages | SSR | Must reflect current state |
| User profiles (public) | SSR | Personalized, indexed, dynamic |
| Search results pages | SSR + canonical | Crawlable version needed |
| Account/dashboard pages | CSR | Not indexed, no SEO concern |
| Admin interfaces | CSR | Not indexed, no SEO concern |
3. Framework Implementation Guide
Next.js (App Router)
// SSG: page is generated at build time (default for static data)
// File: app/blog/[slug]/page.tsx
export async function generateStaticParams() {
const posts = await getPosts();
return posts.map((post) => ({ slug: post.slug }));
}
export default async function BlogPost({ params }) {
const post = await getPost(params.slug); // fetched at build time
return <article>{post.content}</article>;
}// ISR: regenerate on access, revalidate after N seconds
// File: app/products/[slug]/page.tsx
export const revalidate = 3600; // revalidate every hour
export default async function ProductPage({ params }) {
const product = await getProduct(params.slug);
return <ProductDetail product={product} />;
}
// On-demand ISR: trigger revalidation from a webhook
// File: app/api/revalidate/route.ts
import { revalidatePath } from 'next/cache';
export async function POST(request: Request) {
const { slug } = await request.json();
revalidatePath(`/products/${slug}`);
return Response.json({ revalidated: true });
}// SSR: render on every request
// File: app/news/page.tsx
export const dynamic = 'force-dynamic';
export default async function NewsPage() {
const articles = await getLatestNews(); // fetched on every request
return <NewsFeed articles={articles} />;
}// Metadata for SEO - works with all rendering strategies
// File: app/products/[slug]/page.tsx
import { Metadata } from 'next';
export async function generateMetadata({ params }): Promise<Metadata> {
const product = await getProduct(params.slug);
return {
title: product.name,
description: product.description,
alternates: {
canonical: `https://example.com/products/${params.slug}`,
},
openGraph: {
title: product.name,
description: product.description,
images: [product.imageUrl],
},
};
}Next.js (Pages Router, legacy)
// SSG with getStaticProps
export async function getStaticProps({ params }) {
const product = await getProduct(params.slug);
return {
props: { product },
revalidate: 3600, // ISR: revalidate every hour
};
}
// SSR with getServerSideProps
export async function getServerSideProps({ params }) {
const news = await getLatestNews();
return { props: { news } };
}Nuxt 3
// SSG: generate at build time
// nuxt.config.ts
export default defineNuxtConfig({
nitro: {
prerender: {
crawlLinks: true,
routes: ['/sitemap.xml'],
},
},
});
// pages/products/[slug].vue
const { data: product } = await useAsyncData(
`product-${route.params.slug}`,
() => $fetch(`/api/products/${route.params.slug}`)
);
// ISR-equivalent with cache headers
// server/routes/products/[slug].ts
export default defineCachedEventHandler(async (event) => {
const slug = getRouterParam(event, 'slug');
return await getProduct(slug);
}, {
maxAge: 60 * 60, // 1 hour
staleMaxAge: 60 * 60 * 24, // serve stale for 24h while revalidating
});<!-- SEO meta in Nuxt -->
<script setup>
useSeoMeta({
title: product.name,
description: product.description,
ogTitle: product.name,
ogDescription: product.description,
});
useHead({
link: [
{ rel: 'canonical', href: `https://example.com/products/${product.slug}` }
]
});
</script>Astro
Astro defaults to SSG with optional server-side rendering per route.
---
// src/pages/products/[slug].astro
// SSG: default behavior - rendered at build time
export async function getStaticPaths() {
const products = await getProducts();
return products.map(product => ({
params: { slug: product.slug },
props: { product },
}));
}
const { product } = Astro.props;
---
<html>
<head>
<title>{product.name}</title>
<meta name="description" content={product.description} />
<link rel="canonical" href={`https://example.com/products/${product.slug}`} />
</head>
<body>
<h1>{product.name}</h1>
</body>
</html>// astro.config.mjs - enable SSR for specific routes
export default defineConfig({
output: 'hybrid', // SSG by default, SSR opt-in
adapter: node({ mode: 'standalone' }),
});---
// Enable SSR for this specific page
export const prerender = false; // override default SSG
const news = await fetch('/api/news').then(r => r.json());
---Remix
Remix is SSR-first. All routes render on the server by default.
// app/routes/products.$slug.tsx
import { LoaderFunctionArgs, MetaFunction } from '@remix-run/node';
import { useLoaderData } from '@remix-run/react';
export async function loader({ params }: LoaderFunctionArgs) {
const product = await getProduct(params.slug!);
if (!product) throw new Response('Not Found', { status: 404 });
return product;
}
export const meta: MetaFunction<typeof loader> = ({ data }) => {
return [
{ title: data?.name },
{ name: 'description', content: data?.description },
{ tagName: 'link', rel: 'canonical', href: `https://example.com/products/${data?.slug}` },
];
};
export default function ProductPage() {
const product = useLoaderData<typeof loader>();
return <ProductDetail product={product} />;
}For SSG-like behavior in Remix, use HTTP caching headers:
export function headers() {
return {
'Cache-Control': 'public, max-age=3600, s-maxage=86400, stale-while-revalidate=86400',
};
}4. Edge Rendering
Edge rendering runs SSR at CDN edge nodes geographically close to the user. SEO characteristics are identical to standard SSR - Googlebot receives full HTML. Benefits are performance (lower TTFB) which is a positive ranking signal.
When to use edge rendering
- SSR pages where TTFB > 200ms from major geographic markets
- Personalized content that must be indexed (e.g., localized pricing)
- A/B testing that needs to remain invisible to Googlebot
Edge rendering in Next.js
// app/products/[slug]/page.tsx
export const runtime = 'edge';
export const dynamic = 'force-dynamic';
export default async function ProductPage({ params }) {
const product = await getProduct(params.slug);
return <ProductDetail product={product} />;
}Constraints: Edge runtime does not support all Node.js APIs. Avoid database connections in edge functions; use edge-compatible APIs or KV stores instead.
5. Dynamic Rendering (Escape Hatch)
Dynamic rendering serves pre-rendered HTML to crawlers and CSR to regular users. It is an escape hatch for legacy apps that cannot easily migrate to SSR/SSG.
Implementation with a rendering proxy
# Route Googlebot and other bots to a pre-rendering service
map $http_user_agent $is_bot {
default 0;
~*googlebot 1;
~*bingbot 1;
~*slurp 1;
~*duckduckbot 1;
}
server {
location / {
if ($is_bot) {
proxy_pass https://prerender-service.example.com;
}
# Regular users get the CSR app
try_files $uri $uri/ /index.html;
}
}Risks of dynamic rendering
- Cloaking concern: Serving different HTML to bots vs users. Google explicitly permits dynamic rendering as a workaround but views SSR as the correct long-term solution. If the bot version is significantly different from the user version, it may be treated as cloaking.
- Maintenance burden: Two rendering paths to maintain. The bot version can diverge from the user version over time.
- Prerender service latency: Adds latency to Googlebot crawls.
Use dynamic rendering only as a transitional strategy while migrating to SSR/SSG.
6. Hydration and SEO Implications
Hydration is the process of attaching React (or other framework) event handlers to server-rendered HTML in the browser. It does not affect what Googlebot sees on the first crawl but it affects user experience metrics.
Hydration failures that affect SEO
Hydration mismatch: When the server-rendered HTML differs from what React renders on the client, React discards the server HTML and re-renders from scratch. During this process, content may disappear briefly, which can affect Core Web Vitals.
// Common cause: using browser-only APIs in server components
const [mounted, setMounted] = useState(false);
useEffect(() => setMounted(true), []);
// WRONG: conditional rendering based on mount causes hydration mismatch
return mounted ? <ActualContent /> : null;
// BETTER: render content on server, enhance on client
return (
<div suppressHydrationWarning>
<ActualContent />
</div>
);Lazy-loaded content and indexing: Content loaded via React.lazy or dynamic imports is available after the JavaScript bundle loads. For Googlebot second-wave rendering, this content is typically visible. But for pages where immediate indexing is critical, keep primary content in the initial server-rendered output.
Core Web Vitals and rendering
| Metric | Affected by rendering strategy |
|---|---|
| LCP (Largest Contentful Paint) | SSG/SSR have faster LCP; CSR often has poor LCP |
| CLS (Cumulative Layout Shift) | Hydration mismatches cause CLS; SSG/SSR avoid this |
| INP (Interaction to Next Paint) | Mostly a JavaScript execution concern, not rendering strategy |
| TTFB (Time to First Byte) | SSG fastest (static file); SSR variable; CSR fast shell |
LCP is a ranking signal. CSR pages frequently have poor LCP because the browser must download, parse, and execute JavaScript before painting the largest content element. SSG pages typically have strong LCP.
sitemaps-canonicals.md
Sitemaps and Canonical URLs Reference
Detailed implementation guide for XML sitemaps and canonical URL strategy. These two mechanisms work together to tell search engines which URLs are authoritative and how often to re-crawl them.
1. XML Sitemap Specification
Structure and limits
| Constraint | Value |
|---|---|
| Max URLs per sitemap file | 50,000 |
| Max file size (uncompressed) | 50 MB |
| Max file size (compressed, .gz) | 50 MB |
| Max sitemaps in a sitemap index | 50,000 |
| Max total URLs across all sitemaps | 2.5 billion (50,000 x 50,000) |
In practice: compress all sitemaps with gzip. A 50 MB uncompressed sitemap compresses to ~5 MB and is fetched much faster.
Full URL element attributes
<url>
<!-- Required -->
<loc>https://example.com/products/widget</loc>
<!-- Optional: date of last meaningful content change, ISO 8601 -->
<lastmod>2024-01-15</lastmod>
<!-- Optional: hint for recrawl frequency (Google largely ignores this) -->
<changefreq>weekly</changefreq>
<!-- Optional: relative importance 0.0-1.0 (Google largely ignores this) -->
<priority>0.8</priority>
</url>changefreq and priority are hints only. Google's crawl scheduling is based
primarily on its own signals. Spend time on accurate lastmod instead.
lastmod accuracy rules
Always update lastmod when: Title changes, body content substantially changes, main image changes, structured data changes.
Do NOT update lastmod when: Template/CSS/JS changes that do not affect page content, sidebar changes that appear on all pages, navigation updates.
Why accuracy matters: Googlebot tracks whether your lastmod values correlate
with actual content changes. If you always set lastmod to today's date, Googlebot
learns to ignore it. If it accurately reflects real changes, Googlebot uses it to
prioritize recrawls.
Sitemap index for large sites
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<!-- Group by content type for easier monitoring -->
<sitemap>
<loc>https://example.com/sitemaps/products-1.xml</loc>
<lastmod>2024-01-15</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemaps/products-2.xml</loc>
<lastmod>2024-01-15</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemaps/blog.xml</loc>
<lastmod>2024-01-14</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemaps/categories.xml</loc>
<lastmod>2024-01-01</lastmod>
</sitemap>
</sitemapindex>Group sitemaps by content type, not just by size. This lets you monitor indexing rates per content type in Search Console.
What to include (and exclude) from sitemaps
Include:
- All canonical, indexable URLs (returning 200 with real content)
- URLs you actively want Google to discover and crawl
Exclude:
- URLs with
noindexdirective (contradictory signal) - URLs that redirect (include only the final destination)
- Paginated pages beyond page 1 in most cases
- Faceted navigation URLs you want to noindex
- Duplicate content URLs (include only the canonical form)
- Parameter variants (include only the clean URL)
Including non-canonical URLs in your sitemap is a strong anti-pattern. It confuses Googlebot about which version is authoritative.
Submission methods
robots.txt (recommended - always active):
Sitemap: https://example.com/sitemap.xml Sitemap: https://example.com/sitemap-index.xmlGoogle Search Console - Submit via Sitemaps report. Best for initial submission and monitoring indexing rates per sitemap file.
HTTP ping (legacy, still works):
https://www.google.com/ping?sitemap=https://example.com/sitemap.xmlGoogle deprecated this in 2023 - use Search Console instead.
Dynamic sitemap generation (Next.js example)
// app/sitemap.ts
import { MetadataRoute } from 'next';
export default async function sitemap(): Promise<MetadataRoute.Sitemap> {
const products = await getProducts(); // fetch from CMS/DB
const productUrls = products.map((product) => ({
url: `https://example.com/products/${product.slug}`,
lastModified: product.updatedAt,
changeFrequency: 'weekly' as const,
priority: 0.8,
}));
return [
{
url: 'https://example.com',
lastModified: new Date(),
changeFrequency: 'daily',
priority: 1,
},
...productUrls,
];
}For very large catalogs (100k+ pages), generate sitemaps statically during build or on a scheduled job rather than dynamically on each request. Dynamic generation under load can become a bottleneck.
2. Canonical URL Rules
The canonical URL is the "master" URL for a piece of content. All ranking signals (links, crawl frequency, page experience data) accumulate on the canonical URL.
The canonical signal hierarchy
Google uses multiple signals to determine the canonical. In order of weight:
- 301 redirect - Strongest signal. URL A redirects to URL B = B is canonical.
<link rel="canonical">tag - Strong hint. Not a directive - Google may override it if the canonical seems wrong.- Sitemap inclusion - Weak hint. Being in the sitemap suggests a URL is canonical.
- Internal links - Weak hint. The URL you link to most often is treated as canonical.
If these signals conflict, Google picks what it considers the most reliable signal, which may differ from your intent.
Self-referencing canonicals
Every page should declare a self-referencing canonical, even if there is no duplicate content risk. This is defensive practice:
<!-- On https://example.com/products/widget -->
<link rel="canonical" href="https://example.com/products/widget" />
<!-- On https://example.com/blog/post-title -->
<link rel="canonical" href="https://example.com/blog/post-title" />URL variants that require canonical handling
Every site has implicit URL variants. You must pick one form and consistently canonicalize all others to it:
| Variant type | Examples | Pick one form |
|---|---|---|
| Protocol | http:// vs https:// |
Always HTTPS |
| Subdomain | www.example.com vs example.com |
Pick one, 301 the other |
| Trailing slash | /products/widget/ vs /products/widget |
Pick one, be consistent |
| Query parameters | ?ref=footer vs clean URL |
Clean URL is canonical |
| Capitalization | /Products/Widget vs /products/widget |
Lowercase, enforce at server |
| Index files | /about/index.html vs /about/ |
Remove index.html from URLs |
| Fragment | /page#section vs /page |
Fragment is same canonical as /page |
Cross-domain canonicals
For content syndicated or duplicated across domains:
<!-- On https://partner.com/article/original-content -->
<!-- If the original lives on example.com, declare: -->
<link rel="canonical" href="https://example.com/article/original-content" />Cross-domain canonicals tell Google to attribute all signals to the original domain. Use this for:
- Syndicated content (press releases, guest posts)
- AMP pages canonicaling to regular pages
- Country-specific domains that share content with the main domain
Common canonical mistakes
Canonical chain: A -> B -> C
<!-- Page A -->
<link rel="canonical" href="https://example.com/page-b" />
<!-- Page B -->
<link rel="canonical" href="https://example.com/page-c" />Google may follow the chain or stop at B. Always canonical directly to the final URL.
Canonical to a redirect:
<!-- Canonicaling to a URL that itself 301s to somewhere else -->
<link rel="canonical" href="https://example.com/old-page" />
<!-- old-page 301s to /new-page -->The canonical should point to the final destination, not an intermediate redirect.
Canonical to a noindex page:
<!-- Canonicaling to a page that has noindex -->
<link rel="canonical" href="https://example.com/noindex-page" />Contradictory signals. If the canonical is noindexed, the current page will also not be indexed.
Inconsistent canonical with 301:
<!-- URL A 301 redirects to URL C -->
<!-- But URL B has <link rel="canonical" href="URL A"> -->
<!-- Google sees: B -> canonical -> A -> redirect -> C -->Always align redirects and canonical tags to point to the same final URL.
3. Pagination Handling
Modern approach (preferred): Self-canonical per page
Each paginated page is canonicalized to itself. This is the current Google recommendation.
The deprecated rel="prev" and rel="next" pattern is no longer officially supported.
<!-- /products?page=1 -->
<link rel="canonical" href="https://example.com/products" />
<!-- /products?page=2 -->
<link rel="canonical" href="https://example.com/products?page=2" />
<!-- /products?page=3 -->
<link rel="canonical" href="https://example.com/products?page=3" />Only canonical page 2+ to page 1 if they contain identical or near-identical content. If each page has 20 unique products, each page has distinct value and should be self-canonical.
When to noindex paginated pages
Noindex paginated pages when:
- Pages beyond 2-3 receive no organic traffic
- Content is highly similar across pages
- You need to conserve crawl budget
Do NOT noindex page 1 or pages that rank for head terms.
4. hreflang and Canonicals Interaction
hreflang tells Google which language/region variant to serve to which users. It interacts with canonicals and must be consistent.
Correct hreflang setup
<!-- On https://example.com/en/products/widget (English, US) -->
<link rel="alternate" hreflang="en-US" href="https://example.com/en/products/widget" />
<link rel="alternate" hreflang="en-GB" href="https://example.com/en-gb/products/widget" />
<link rel="alternate" hreflang="de" href="https://example.com/de/products/widget" />
<link rel="alternate" hreflang="x-default" href="https://example.com/products/widget" />
<link rel="canonical" href="https://example.com/en/products/widget" />Rules for hreflang:
- All language variants must reference all other language variants (reciprocal)
- hreflang does not change the canonical - each language page canonicals to itself
- Do not canonical an FR page to an EN page (that removes the FR page from index)
x-defaultis the fallback for users who don't match any specific language
Common hreflang failure
<!-- WRONG: FR page canonicaling to EN canonical -->
<!-- On /fr/products/widget: -->
<link rel="canonical" href="https://example.com/en/products/widget" />
<link rel="alternate" hreflang="fr" href="https://example.com/fr/products/widget" />This tells Google "this FR page is a duplicate of the EN page" and the FR page will not be indexed for French users. Each locale page should be self-canonical.
hreflang in XML sitemaps
For large multilingual sites, managing hreflang in sitemaps is more maintainable than per-page HTML tags:
<url>
<loc>https://example.com/en/products/widget</loc>
<xhtml:link rel="alternate" hreflang="en-US" href="https://example.com/en/products/widget"/>
<xhtml:link rel="alternate" hreflang="de" href="https://example.com/de/products/widget"/>
<xhtml:link rel="alternate" hreflang="x-default" href="https://example.com/products/widget"/>
</url>Add xmlns:xhtml="http://www.w3.org/1999/xhtml" to the <urlset> opening tag.
5. Canonical Validation Checklist
Before deploying canonical changes:
- All canonical tags point to absolute URLs (including protocol)
- Canonical target returns HTTP 200
- Canonical target is not itself noindexed
- Canonical target does not redirect elsewhere
- Canonical tag is in the
<head>, not<body> - Only one canonical tag per page (duplicate canonicals are ignored)
- Canonical URL form is consistent with your chosen URL standard (www/non-www, trailing slash)
- Sitemap URLs match their declared canonical URLs exactly
- hreflang alternate URLs match the canonical URL for that locale (not a different locale's canonical)
- No canonical loops (A -> B -> A)
Frequently Asked Questions
What is technical-seo?
Use this skill when working on technical SEO infrastructure - crawlability, indexing, XML sitemaps, canonical URLs, robots.txt, redirect chains, rendering strategies (SSR/SSG/ISR/CSR), crawl budget optimization, and search engine rendering. Triggers on fixing indexing issues, configuring crawl directives, choosing rendering strategies for SEO, debugging Google Search Console errors, or auditing site architecture for search engines.
How do I install technical-seo?
Run npx skills add AbsolutelySkilled/AbsolutelySkilled --skill technical-seo in your terminal. The skill will be immediately available in your AI coding agent.
What AI agents support technical-seo?
technical-seo works with claude-code, gemini-cli, openai-codex, mcp. Install it once and use it across any supported AI coding agent.