programmatic-seo

Use this skill when building programmatic SEO pages at scale - template-based page generation, data-driven landing pages, automated internal linking, and avoiding thin content or doorway page penalties. Triggers on generating thousands of location pages, comparison pages, tool pages, or any template-driven SEO content strategy that creates pages programmatically from data sources.

What is programmatic-seo?

Overview Files

programmatic-seo

programmatic-seo is a production-ready AI agent skill for claude-code, gemini-cli, openai-codex, and 1 more. Building programmatic SEO pages at scale - template-based page generation, data-driven landing pages, automated internal linking, and avoiding thin content or doorway page penalties.

Quick Facts

Field	Value
Category	marketing
Version	0.1.0
Platforms	claude-code, gemini-cli, openai-codex, mcp
License	MIT

How to Install

Make sure you have Node.js installed on your machine.
Run the following command in your terminal:

npx skills add AbsolutelySkilled/AbsolutelySkilled --skill programmatic-seo

The programmatic-seo skill is now available in your AI coding agent (Claude Code, Gemini CLI, OpenAI Codex, etc.).

Overview

Programmatic SEO (pSEO) is the practice of generating large numbers of search-optimized pages from templates and structured data sources, rather than writing each page by hand. Companies like Zapier (app integration pages), Nomadlist (city pages), and Wise (currency converter pages) capture millions of long-tail search visitors this way. The central challenge is creating genuine value on every page - Google actively penalizes thin content and doorway pages, so raw template fill without unique data is not enough.

Platforms

claude-code
gemini-cli
openai-codex
mcp

Related Skills

Pair programmatic-seo with these complementary skills:

Frequently Asked Questions

What is programmatic-seo?

How do I install programmatic-seo?

Run npx skills add AbsolutelySkilled/AbsolutelySkilled --skill programmatic-seo in your terminal. The skill will be immediately available in your AI coding agent.

What AI agents support programmatic-seo?

This skill works with claude-code, gemini-cli, openai-codex, mcp. Install it once and use it across any supported AI coding agent.

Maintainers

@maddhruv

Generated from AbsolutelySkilled

SKILL.md

Programmatic SEO

When to use this skill

Trigger this skill when the user:

Wants to build pSEO pages at scale (location pages, comparison pages, tool pages)
Is designing a template for data-driven landing pages
Needs to generate pages programmatically from a database or spreadsheet
Wants to implement automated internal linking between a large set of pages
Is setting up a seed-and-scale launch strategy for a pSEO project
Needs to avoid thin content or doorway page Google penalties
Wants to monitor programmatic page performance in Search Console at scale
Is configuring sitemap indexes or crawl budget for thousands of pages

Do NOT trigger this skill for:

Writing individual pieces of editorial content or blog posts
Keyword research and topic ideation (outside the context of pSEO template planning)

Key principles

Every page must offer unique value beyond template fill - Swapping only the city name is not enough. Each page needs at least one unique data zone: local statistics, real pricing, user reviews, or specific inventory. Without it, Google will eventually deindex the entire batch.
Data quality is the moat - The uniqueness of your pages flows entirely from the uniqueness of your data. Proprietary datasets (scraped, licensed, or user-generated) create defensible pSEO. Generic public data creates generic pages that get deindexed.
Internal linking between programmatic pages is the growth engine - A page Google cannot crawl to is a page that does not rank. Automated hub-and-spoke internal linking ensures every page is reachable, distributes PageRank through the cluster, and signals topical authority.
Monitor for thin content at scale with automated quality gates - At thousands of pages you cannot review manually. Build quality score checks into the generation pipeline: minimum word count, minimum unique data fields populated, dupe content ratio. Block pages that fail before they go live.
Start small, validate, then scale - Publish a batch of 50-100 pages first. Check Search Console for indexing coverage and ranking signals after 4-6 weeks. Only scale to thousands once the template proves out in real search data.

Core concepts

pSEO page types map to user search intent patterns:

Type	Example	Unique data needed
Location page	"Best accountants in Austin TX"	Local listings, reviews, pricing
Comparison page	"Notion vs Airtable"	Feature tables, pricing diff, use-case match
Tool page	"USD to EUR converter"	Live exchange rate, calculation output
Aggregator page	"Top 10 remote-friendly cities"	Ranked dataset with per-row metrics
Glossary page	"What is a chargeback"	Definition, examples, related terms

Template anatomy - every pSEO template has two zones:

Unique data zones: sections populated from per-page data fields (statistics, lists, prices, reviews). These are what make pages distinct from each other.
Boilerplate zones: shared headers, footers, explanatory copy, CTAs. These are identical across all pages.

The ratio of unique data to boilerplate is your "content diversity score." Aim for at least 40% of rendered content to come from unique data. Below 20% risks a thin content penalty at scale.

The thin content line is the threshold Google uses to decide whether a page adds enough value to deserve indexing. A page crosses the line when: (a) duplicate content ratio is high across the batch, (b) user intent cannot be satisfied without leaving the page, or (c) the only differentiation is a keyword swap in the title tag.

Data sources for pSEO (ranked by defensibility):

User-generated content (reviews, submissions) - highest moat
Licensed datasets (APIs with paid access)
First-party data (your own product database)
Scraped/aggregated public data - lowest moat, highest risk

Batch publishing strategy - publish in cohorts rather than all at once. A sudden spike of thousands of new pages triggers Google's quality review systems. Publish 100 pages/day and let Google crawl and index them naturally.

Common tasks

Design a pSEO template with required unique data zones

Before writing any code, define the template data model. Every field that changes per page is a "slot." Every field that is the same across all pages is "boilerplate." A good rule of thumb: at least 5 distinct slot fields per page.

// Template data model for a "city + service" pSEO page
interface LocationPageData {
  // Unique slots - must come from data source
  city: string;
  state: string;
  providerCount: number;
  averagePrice: number;
  topProviders: Provider[];
  localStat: string;         // e.g. "Austin has 340 licensed accountants"
  nearbyLocations: string[]; // for internal linking

  // Derived (computed, not boilerplate)
  slug: string;              // e.g. "accountants-austin-tx"
  canonicalUrl: string;
  metaDescription: string;   // dynamically composed from slots
}

Validate that your data source can populate every slot before writing a single template. If a slot is empty for 30%+ of pages, redesign the template to make that slot optional or remove it.

Build a data pipeline for page generation with Next.js

Use generateStaticParams (App Router) or getStaticPaths (Pages Router) to drive static generation from your data source.

// app/[city]/[service]/page.tsx - Next.js App Router
import { db } from '@/lib/db';

export async function generateStaticParams() {
  const locations = await db.locations.findMany({
    where: { providerCount: { gte: 5 } }, // quality gate: skip thin pages
    select: { citySlug: true, serviceSlug: true },
  });

  return locations.map((loc) => ({
    city: loc.citySlug,
    service: loc.serviceSlug,
  }));
}

export async function generateMetadata({ params }: Props) {
  const data = await getLocationPageData(params.city, params.service);
  return {
    title: `Best ${data.serviceLabel} in ${data.cityName} - Top ${data.providerCount} Providers`,
    description: data.metaDescription,
    alternates: { canonical: data.canonicalUrl },
  };
}

export default async function LocationPage({ params }: Props) {
  const data = await getLocationPageData(params.city, params.service);
  return <LocationTemplate data={data} />;
}

Use incremental static regeneration (ISR) with a revalidate interval for pages where data changes frequently (prices, counts). This avoids full rebuilds for large pSEO sites.

Implement automated internal linking between programmatic pages

See references/internal-linking-automation.md for the full hub-and-spoke algorithm. The minimum viable implementation: each page links to its geographic/categorical siblings.

// lib/related-pages.ts
export async function getRelatedPages(
  currentPage: LocationPageData,
  limit = 6
): Promise<RelatedPage[]> {
  // Strategy 1: same service, nearby cities (geographic proximity)
  const nearbyCities = await db.locations.findMany({
    where: {
      serviceSlug: currentPage.serviceSlug,
      stateSlug: currentPage.stateSlug,
      citySlug: { not: currentPage.citySlug },
    },
    orderBy: { providerCount: 'desc' },
    take: limit,
    select: { cityName: true, citySlug: true, serviceSlug: true, providerCount: true },
  });

  return nearbyCities.map((loc) => ({
    title: `${currentPage.serviceLabel} in ${loc.cityName}`,
    href: `/${loc.citySlug}/${loc.serviceSlug}`,
    signal: `${loc.providerCount} providers`,
  }));
}

Inject this into every template as a "Related locations" section. This creates a full internal link graph across the pSEO cluster.

Set up quality gates to prevent thin pages from going live

A thin page that gets published is harder to remove than one that never went live. Add a quality score check to the generation pipeline.

// lib/quality-gate.ts
interface QualityScore {
  passes: boolean;
  score: number;
  failReasons: string[];
}

export function scoreLocationPage(data: LocationPageData): QualityScore {
  const failReasons: string[] = [];
  let score = 0;

  if (data.providerCount >= 5) score += 30;
  else failReasons.push(`Too few providers: ${data.providerCount} (min 5)`);

  if (data.topProviders.length >= 3) score += 25;
  else failReasons.push('Not enough top provider data');

  if (data.localStat?.length > 20) score += 20;
  else failReasons.push('Missing or weak local stat');

  if (data.averagePrice > 0) score += 15;
  else failReasons.push('Missing average price data');

  if (data.nearbyLocations.length >= 3) score += 10;
  else failReasons.push('Not enough nearby locations for internal linking');

  return { passes: score >= 70, score, failReasons };
}

// In generateStaticParams - filter out pages below threshold
const locations = rawLocations.filter((loc) => {
  const { passes } = scoreLocationPage(loc);
  if (!passes) console.warn(`Skipping thin page: ${loc.slug}`);
  return passes;
});

Create a seed-and-scale launch strategy

Start with a "seed" batch to validate template effectiveness before scaling.

Week 1-2 (Seed):

Publish 50-100 pages in the highest-value segment (best data quality, highest search volume)
Submit to Google Search Console via sitemap
Set up rank tracking for a sample of target keywords

Week 3-6 (Observe):

Monitor Search Console Coverage report for indexing issues
Check for "Crawled - currently not indexed" or "Duplicate, Google chose different canonical"
Track ranking movement for seeded pages

Week 6+ (Scale decision):

If seed pages index cleanly and show ranking signal: begin scaling (100-200 pages/day)
If pages are not indexing: audit template quality, improve unique data, fix before scaling
Never publish thousands of pages while coverage issues are unresolved

Monitor programmatic page performance at scale

At scale you cannot review pages individually. Use Search Console API to monitor programmatic page performance across the cluster.

// scripts/pSEO-health-check.ts
// Requires: npm install googleapis
import { google } from 'googleapis';

const searchconsole = google.searchconsole('v1');

export async function getPseoClusterMetrics(
  siteUrl: string,
  urlPattern: string, // e.g. '/city/' to filter pSEO cluster
  days = 28
): Promise<ClusterMetrics> {
  const endDate = new Date().toISOString().split('T')[0];
  const startDate = new Date(Date.now() - days * 86400000).toISOString().split('T')[0];

  const response = await searchconsole.searchanalytics.query({
    siteUrl,
    requestBody: {
      startDate,
      endDate,
      dimensions: ['page'],
      dimensionFilterGroups: [{
        filters: [{ dimension: 'page', operator: 'contains', expression: urlPattern }],
      }],
      rowLimit: 25000,
    },
  });

  const rows = response.data.rows ?? [];
  const zeroImpression = rows.filter((r) => (r.impressions ?? 0) === 0);

  return {
    totalPages: rows.length,
    pagesWithImpressions: rows.length - zeroImpression.length,
    zeroImpressionPages: zeroImpression.length,
    avgCtr: rows.reduce((sum, r) => sum + (r.ctr ?? 0), 0) / rows.length,
    avgPosition: rows.reduce((sum, r) => sum + (r.position ?? 0), 0) / rows.length,
  };
}

Handle indexing for large pSEO sites (sitemap index + crawl budget)

A single sitemap file supports at most 50,000 URLs. For large pSEO sites, use a sitemap index that points to segmented sitemap files.

// app/sitemap-index.xml/route.ts
export async function GET() {
  const services = await db.services.findMany({ select: { slug: true } });

  const sitemapIndex = `<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  ${services.map((s) => `
  <sitemap>
    <loc>https://example.com/sitemaps/${s.slug}.xml</loc>
    <lastmod>${new Date().toISOString().split('T')[0]}</lastmod>
  </sitemap>`).join('')}
</sitemapindex>`;

  return new Response(sitemapIndex, {
    headers: { 'Content-Type': 'application/xml' },
  });
}

Crawl budget tips for large pSEO sites:

Exclude zero-value internal pages from sitemap (admin, user profiles, search results)
Use robots.txt to block faceted navigation and filter URLs that generate duplicates
Prioritize your highest-quality pSEO pages in sitemap <priority> tags (0.8 for top pages)
Monitor crawl stats in Search Console > Settings > Crawl stats

Anti-patterns / common mistakes

Mistake	Why it's wrong	What to do instead
Only swapping the keyword in the title	Google detects near-duplicate content at scale and deindexes the whole cluster	Ensure at least 5 distinct data fields differ per page
Publishing thousands of pages on day one	Sudden index spikes trigger quality filters; many pages won't index at all	Seed 50-100 pages, validate coverage, then scale gradually
No quality gate before generation	Thin pages for cities with 1-2 providers go live, damaging domain quality signals	Score every page before publishing; skip pages below threshold
Ignoring Search Console Coverage report	Indexing issues compound silently at scale	Check Coverage weekly for the first 3 months after launch
AI-generated filler for thin data slots	LLM filler that sounds generic counts as thin content - Google's quality systems detect it	Either get real data or do not create pages where data is absent
Flat URL structure for thousands of pages	Crawl budget exhausted on leaf pages before Google reaches all of them	Use hierarchical URLs (`/service/state/city`) with clear hub pages
No canonical tags on filtered/sorted variants	Pagination and filter parameters create duplicate URLs	Add canonical pointing to the base pSEO URL on all filter variants

Gotchas

Scaling before validating seed batch - The most expensive mistake is generating 10,000 pages before confirming the first 100 index properly. Google's quality filters can suppress the entire batch silently. Always wait for Search Console coverage confirmation on your seed batch before scaling.
AI-generated text filling thin data slots - When real data is sparse, it's tempting to use an LLM to pad thin fields. Google's quality systems detect generic LLM filler as thin content even when it's grammatically fluent. Either get real data or suppress the page - don't fill slots with invented text.
Batch publishing not throttled on initial launch - Publishing thousands of pages in a single deploy triggers Google's spam detection systems. Many pages end up in "Discovered - currently not indexed" limbo for months. Throttle the initial batch to 100-200 pages/day using a scheduled job or incremental static regeneration rather than a single full build dump.
Canonical tags missing on filter/sort URL variants - Programmatic pages often have filter or sort variants (?sort=price, ?region=west) that generate duplicate content. Without canonical tags pointing to the base URL, Google indexes the duplicates and splits PageRank across variants.
Content diversity score measured at template level, not rendered level - A template can look diverse in code but render nearly identical pages if the data source has uniform values. Measure diversity on rendered HTML output, not on template logic.

References

For deep-dive content on specific sub-topics, load the relevant references file:

references/template-generation.md - Template design patterns, data sourcing strategies, Next.js/Astro bulk static generation, quality scoring algorithms, batch publishing cadence. Load when designing or implementing the page generation pipeline.
references/internal-linking-automation.md - Hub-and-spoke linking patterns, related pages algorithms (geographic proximity, categorical similarity), breadcrumb generation, contextual link injection, silo architecture, link graph visualization. Load when implementing internal linking at scale.

Only load a references file when the current task requires it.

References

internal-linking-automation.md

Internal Linking Automation for pSEO

Automated internal linking is the connective tissue of a pSEO site. Without it, thousands of pages exist in isolation - hard for Google to crawl and impossible to pass PageRank between. With it, the cluster becomes a topical authority signal.

Hub-and-spoke architecture

The fundamental pSEO link structure. A small number of "hub" pages (high editorial value, strong links) fan out to many "spoke" pages (programmatic, data-driven).

                    ┌─────────────────┐
                    │   Root Hub      │
                    │  /accountants/  │
                    └────────┬────────┘
                             │
           ┌─────────────────┼──────────────────┐
           │                 │                  │
    ┌──────▼──────┐  ┌───────▼──────┐  ┌───────▼──────┐
    │  State Hub  │  │  State Hub   │  │  State Hub   │
    │  /TX/       │  │  /CA/        │  │  /NY/        │
    └──────┬──────┘  └──────────────┘  └──────────────┘
           │
    ┌──────┴──────────────────────────────────┐
    │        │           │          │         │
  Austin   Dallas    Houston    San Antonio  Austin
  spoke    spoke     spoke      spoke        spoke

Link direction rules:

Hubs always link DOWN to spokes (navigational breadth)
Spokes always link UP to their hub (authority concentration)
Spokes link SIDEWAYS to nearby spokes (topical clustering)
Never link from spoke to spoke across different hubs without context

Related pages algorithm

The algorithm that decides which pages link to each other on the same level.

Geographic proximity (location pSEO)

// lib/related-pages/geographic.ts
import { db } from '@/lib/db';

interface RelatedPage {
  title: string;
  href: string;
  signal: string; // "120 providers" or "from $45/hr"
}

export async function getGeographicRelatedPages(
  citySlug: string,
  stateSlug: string,
  serviceSlug: string,
  limit = 6
): Promise<RelatedPage[]> {
  // Priority 1: same state, same service (most relevant)
  const sameSate = await db.locationPages.findMany({
    where: {
      stateSlug,
      serviceSlug,
      citySlug: { not: citySlug },
      qualityScore: { gte: 65 },
    },
    orderBy: { providerCount: 'desc' }, // show richest pages first
    take: limit,
  });

  if (sameSate.length >= limit) {
    return sameSate.map(toRelatedPage);
  }

  // Priority 2: different state, same service (fill remaining slots)
  const otherStates = await db.locationPages.findMany({
    where: {
      serviceSlug,
      stateSlug: { not: stateSlug },
      citySlug: { not: citySlug },
      qualityScore: { gte: 65 },
    },
    orderBy: { searchVolume: 'desc' },
    take: limit - sameSate.length,
  });

  return [...sameSate, ...otherStates].map(toRelatedPage);
}

function toRelatedPage(loc: {
  cityName: string;
  citySlug: string;
  serviceSlug: string;
  providerCount: number;
}): RelatedPage {
  return {
    title: `${loc.cityName}`,
    href: `/${loc.serviceSlug}/${loc.citySlug}`,
    signal: `${loc.providerCount} providers`,
  };
}

Categorical similarity (comparison/tool pSEO)

// lib/related-pages/categorical.ts

export async function getCategoricalRelatedPages(
  currentSlug: string,
  categoryTags: string[],
  limit = 6
): Promise<RelatedPage[]> {
  // Find pages sharing the most tags with the current page
  const candidates = await db.comparisonPages.findMany({
    where: {
      slug: { not: currentSlug },
      tags: { hasSome: categoryTags },
      qualityScore: { gte: 65 },
    },
  });

  // Score by tag overlap count
  const scored = candidates
    .map((page) => ({
      page,
      overlap: page.tags.filter((t) => categoryTags.includes(t)).length,
    }))
    .sort((a, b) => b.overlap - a.overlap)
    .slice(0, limit);

  return scored.map(({ page }) => ({
    title: page.title,
    href: `/${page.slug}`,
    signal: page.shortSummary,
  }));
}

Breadcrumb generation

Breadcrumbs serve dual purpose: user navigation and internal linking to hub pages. Every breadcrumb link is an internal link with descriptive anchor text.

// lib/breadcrumbs.ts

export interface Breadcrumb {
  label: string;
  href: string | null; // null for the current page (no link)
}

export function buildLocationBreadcrumbs(
  serviceLabel: string,
  serviceSlug: string,
  stateLabel: string,
  stateSlug: string,
  cityLabel: string,
): Breadcrumb[] {
  return [
    { label: 'Home', href: '/' },
    { label: serviceLabel, href: `/${serviceSlug}/` },
    { label: stateLabel, href: `/${serviceSlug}/${stateSlug}/` },
    { label: cityLabel, href: null }, // current page - no link
  ];
}

// React component
// components/Breadcrumbs.tsx
import type { Breadcrumb } from '@/lib/breadcrumbs';

export function Breadcrumbs({ crumbs }: { crumbs: Breadcrumb[] }) {
  return (
    <nav aria-label="Breadcrumb">
      <ol itemScope itemType="https://schema.org/BreadcrumbList">
        {crumbs.map((crumb, index) => (
          <li
            key={crumb.label}
            itemScope
            itemType="https://schema.org/ListItem"
            itemProp="itemListElement"
          >
            {crumb.href ? (
              <a href={crumb.href} itemProp="item">
                <span itemProp="name">{crumb.label}</span>
              </a>
            ) : (
              <span itemProp="name">{crumb.label}</span>
            )}
            <meta itemProp="position" content={String(index + 1)} />
          </li>
        ))}
      </ol>
    </nav>
  );
}

Add BreadcrumbList schema markup to the breadcrumbs. Google uses this to display breadcrumb trails in SERPs, which improves click-through rate for pSEO pages.

Contextual link injection

Beyond a "related pages" section, inject links directly into body copy where relevant. This is the highest-value internal link type because anchor text is natural and in-context.

// lib/contextual-links.ts

interface ContextualLinkConfig {
  keyword: string;       // trigger phrase in body copy
  href: string;          // destination URL
  anchorText: string;    // anchor text to use (may differ from keyword)
  maxOccurrences: number; // how many times to inject per page (usually 1)
}

export function injectContextualLinks(
  htmlContent: string,
  links: ContextualLinkConfig[]
): string {
  let result = htmlContent;

  for (const link of links) {
    let count = 0;
    const regex = new RegExp(`\\b(${escapeRegex(link.keyword)})\\b`, 'gi');

    result = result.replace(regex, (match) => {
      if (count >= link.maxOccurrences) return match;
      count++;
      return `<a href="${link.href}">${link.anchorText}</a>`;
    });
  }

  return result;
}

function escapeRegex(str: string): string {
  return str.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
}

// Usage in template rendering
const links: ContextualLinkConfig[] = await db.contextualLinks.findMany({
  where: { pageType: 'location', serviceSlug: data.serviceSlug },
});

const linkedContent = injectContextualLinks(data.bodyHtml, links);

Limit contextual injection to 1 occurrence per keyword per page. Over-linking looks spammy and can trigger Google's link spam classifier. One natural link outperforms three forced ones.

Silo architecture

A silo groups topically related pages so PageRank flows within the topic cluster rather than leaking to unrelated content.

Hard silo (strict): pages within a silo only link to other pages in the same silo. Used for highly competitive niches where topical authority concentration matters most.

Soft silo (practical): pages primarily link within their silo, but cross-silo links are allowed at the hub level. This is the recommended approach for most pSEO sites.

// lib/silo.ts - enforce soft silo rules

export function validateInternalLink(
  sourceSilo: string,
  targetSilo: string,
  sourceDepth: number,  // 0 = root, 1 = hub, 2 = spoke
): { allowed: boolean; reason?: string } {
  // Rule 1: same-silo links always allowed
  if (sourceSilo === targetSilo) {
    return { allowed: true };
  }

  // Rule 2: root and hub pages can cross-link (soft silo)
  if (sourceDepth <= 1) {
    return { allowed: true };
  }

  // Rule 3: spokes cannot cross-link to other silos
  return {
    allowed: false,
    reason: `Spoke page in silo "${sourceSilo}" cannot link to silo "${targetSilo}"`,
  };
}

URL structure reflects silo structure:

/accountants/                    ← root hub (silo: accountants)
/accountants/texas/              ← state hub
/accountants/texas/austin/       ← city spoke
/accountants/texas/dallas/       ← city spoke

/bookkeepers/                    ← separate silo
/bookkeepers/texas/
/bookkeepers/texas/austin/

Link graph visualization

For large pSEO sites, visualize the link graph to spot orphaned pages, thin clusters, and over-linked hub pages.

// scripts/generate-link-graph.ts
// Outputs a Graphviz DOT file for visualization

import { db } from '@/lib/db';

async function generateLinkGraph(): Promise<void> {
  const pages = await db.locationPages.findMany({
    include: { outboundLinks: { include: { target: true } } },
  });

  const lines: string[] = ['digraph pSEO {', '  rankdir=TB;'];

  for (const page of pages) {
    const depth = page.depth; // 0=root, 1=state, 2=city
    const color = depth === 0 ? 'red' : depth === 1 ? 'orange' : 'lightblue';
    lines.push(`  "${page.slug}" [label="${page.slug}" fillcolor="${color}" style=filled];`);

    for (const link of page.outboundLinks) {
      lines.push(`  "${page.slug}" -> "${link.target.slug}";`);
    }
  }

  lines.push('}');

  const fs = await import('fs');
  fs.writeFileSync('reports/link-graph.dot', lines.join('\n'));
  console.log('Link graph written to reports/link-graph.dot');
  console.log('Render: dot -Tsvg reports/link-graph.dot -o reports/link-graph.svg');
}

generateLinkGraph();

Metrics to check in the graph:

Metric	Healthy	Concerning
Orphaned pages (0 inbound links)	< 1% of total	> 5% - check sitemap and hub pages
Hub inbound link concentration	Top 10% of pages get 60-70% of links	Top 1% gets 90%+ (bottleneck)
Average link depth (clicks from root)	< 3 for all spoke pages	> 4 for any page (crawl risk)
Cross-silo links from spokes	0 (hard silo) or < 5% (soft silo)	> 10% dilutes topical authority

Link audit script

Run periodically to detect internal linking regressions - orphaned pages, broken links, or silo violations introduced during data updates.

// scripts/link-audit.ts
import { db } from '@/lib/db';

interface AuditResult {
  orphanedPages: string[];
  brokenLinks: Array<{ source: string; target: string }>;
  siloViolations: Array<{ source: string; target: string; reason: string }>;
}

export async function runLinkAudit(): Promise<AuditResult> {
  const allPages = await db.locationPages.findMany({
    include: { outboundLinks: true, inboundLinks: true },
  });

  const slugSet = new Set(allPages.map((p) => p.slug));

  const orphanedPages = allPages
    .filter((p) => p.depth > 0 && p.inboundLinks.length === 0)
    .map((p) => p.slug);

  const brokenLinks: AuditResult['brokenLinks'] = [];
  const siloViolations: AuditResult['siloViolations'] = [];

  for (const page of allPages) {
    for (const link of page.outboundLinks) {
      if (!slugSet.has(link.targetSlug)) {
        brokenLinks.push({ source: page.slug, target: link.targetSlug });
      }

      const target = allPages.find((p) => p.slug === link.targetSlug);
      if (target && page.depth >= 2 && page.silo !== target.silo) {
        siloViolations.push({
          source: page.slug,
          target: target.slug,
          reason: `Cross-silo spoke link: ${page.silo} -> ${target.silo}`,
        });
      }
    }
  }

  return { orphanedPages, brokenLinks, siloViolations };
}

// CI integration - fail build if audit finds regressions
const result = await runLinkAudit();
const issues = result.orphanedPages.length + result.brokenLinks.length + result.siloViolations.length;

if (issues > 0) {
  console.error('Link audit failed:', JSON.stringify(result, null, 2));
  process.exit(1);
}

console.log('Link audit passed. No issues found.');

Run this in CI before deploying any data update that adds, removes, or restructures pSEO pages. A broken internal link graph is silent in production but compounds over time as Google's crawl model of your site diverges from reality.

template-generation.md

pSEO Template Generation

Deep reference for designing templates, sourcing data, generating pages at scale, and enforcing quality before publication.

Template anatomy

A pSEO template has three zones that must be explicitly designed:

┌─────────────────────────────────────────────┐
│  HEADER FORMULA (semi-unique)               │
│  H1: [Service] in [City], [State]           │
│  Meta: Best [Service] in [City] | [Brand]   │
├─────────────────────────────────────────────┤
│  UNIQUE DATA ZONES (per-page data)          │
│  ● Local statistics / market data           │
│  ● Provider list / inventory / prices       │
│  ● User reviews / ratings                  │
│  ● Comparison table rows                   │
├─────────────────────────────────────────────┤
│  SUPPORTING CONTENT (boilerplate)           │
│  ● How it works explanation                 │
│  ● FAQs (partially templatized)             │
│  ● Trust signals (partly unique)            │
├─────────────────────────────────────────────┤
│  INTERNAL LINKS (auto-generated)            │
│  ● Related pages sidebar/section            │
│  ● Breadcrumb navigation                   │
│  ● Hub page back-link                      │
├─────────────────────────────────────────────┤
│  CTA (boilerplate + personalized)           │
│  ● Primary action tied to page context      │
└─────────────────────────────────────────────┘

Header formula patterns:

Page type	H1 formula	Title tag formula
Location	`Best [Service] in [City], [State]`	`Top [N] [Service] in [City] ([Year])`
Comparison	`[Tool A] vs [Tool B]: Full Comparison`	`[Tool A] vs [Tool B] - Which is Better?`
Tool/Calculator	`[Currency] to [Currency] Converter`	`Convert [Currency] to [Currency] - Live Rate`
Aggregator	`Best [Category] in [Location]`	`[N] Best [Category] in [Location] ([Year])`
Glossary	`What is [Term]? Definition & Examples`	`[Term] Definition - [Brand]`

Always include the year in title tags for pages where freshness is a ranking factor (comparison pages, top-N aggregators). Use a currentYear slot populated at build time - never hard-code.

Data sourcing strategies

Tier 1 - Proprietary / user-generated data (highest moat)

Data that only you have. Google cannot find this elsewhere, so pages are inherently unique.

Product database: inventory, pricing, SKU attributes
User reviews and ratings: collected via your platform
Transaction data: average deal size, volume, completion rate
Behavioral data: "most searched for in [city]" signals

Tier 2 - Licensed dataset APIs

Paid APIs where the license restricts redistribution. Other pSEO competitors cannot replicate your exact dataset.

Yelp Fusion API - business listings, reviews, hours
Google Places API - location data, ratings, photos
Clearbit - company enrichment data for B2B pSEO
SerpAPI - SERP data for comparison research pages
Numerator / Nielsen - consumer market data (expensive, high moat)

Tier 3 - Aggregated public data (lowest moat)

Free and open data that anyone can use. Low moat but fast to get started.

US Census Bureau - population, income, demographics by city
BLS.gov - labor market data, wage data by metro
Data.gov - government open datasets by topic
Wikipedia data dumps - useful for glossary and entity pages
OpenStreetMap - geographic data for location pages

Data freshness requirements by page type:

Page type	Stale threshold	Update strategy
Tool/calculator	1 day	ISR with short revalidate
Price comparison	1 week	ISR with weekly revalidate
Location listing	1 month	Monthly rebuild or ISR
Aggregator/ranking	3 months	Quarterly rebuild
Glossary	6-12 months	Manual trigger only

Next.js bulk static generation patterns

Pattern 1 - Database-driven generateStaticParams

The most common pattern. Query your database at build time to produce the params list.

// app/[service]/[state]/[city]/page.tsx
import { db } from '@/lib/db';
import { notFound } from 'next/navigation';

type Params = { service: string; state: string; city: string };

export async function generateStaticParams(): Promise<Params[]> {
  // Only generate pages that pass quality gate
  const pages = await db.locationPages.findMany({
    where: {
      qualityScore: { gte: 70 },
      isActive: true,
    },
    select: {
      serviceSlug: true,
      stateSlug: true,
      citySlug: true,
    },
  });

  return pages.map((p) => ({
    service: p.serviceSlug,
    state: p.stateSlug,
    city: p.citySlug,
  }));
}

export const dynamicParams = false; // 404 for any params not in generateStaticParams

Pattern 2 - CSV/JSON file-driven generation (no database)

Good for bootstrapping a pSEO project from a spreadsheet or data export.

// lib/pSEO-data.ts
import fs from 'fs';
import path from 'path';
import { parse } from 'csv-parse/sync';

export interface LocationRow {
  city: string;
  state: string;
  citySlug: string;
  stateSlug: string;
  providerCount: number;
  averagePrice: number;
  topProviders: string; // JSON stringified array in CSV
}

export function loadLocationData(): LocationRow[] {
  const csvPath = path.join(process.cwd(), 'data', 'locations.csv');
  const content = fs.readFileSync(csvPath, 'utf-8');
  return parse(content, { columns: true, cast: true });
}

// app/[city]/page.tsx
import { loadLocationData } from '@/lib/pSEO-data';

export async function generateStaticParams() {
  const rows = loadLocationData();
  return rows
    .filter((r) => r.providerCount >= 5) // quality gate
    .map((r) => ({ city: r.citySlug }));
}

Pattern 3 - Incremental Static Regeneration for live data

For pSEO pages where data changes regularly (prices, inventory, ratings).

// app/[city]/[service]/page.tsx
export const revalidate = 86400; // revalidate every 24 hours

// Or for on-demand revalidation from a webhook:
// app/api/revalidate/route.ts
import { revalidatePath } from 'next/cache';

export async function POST(request: Request) {
  const { secret, citySlug, serviceSlug } = await request.json();
  if (secret !== process.env.REVALIDATION_SECRET) {
    return Response.json({ error: 'Unauthorized' }, { status: 401 });
  }
  revalidatePath(`/${citySlug}/${serviceSlug}`);
  return Response.json({ revalidated: true });
}

Astro bulk static generation patterns

Astro's content collections and dynamic routes work well for file-based pSEO.

// src/pages/[city]/[service].astro
---
import { getCollection } from 'astro:content';
import LocationTemplate from '@/components/LocationTemplate.astro';

export async function getStaticPaths() {
  const locations = await getCollection('locations', ({ data }) => {
    return data.providerCount >= 5; // quality gate
  });

  return locations.map((location) => ({
    params: {
      city: location.data.citySlug,
      service: location.data.serviceSlug,
    },
    props: { location },
  }));
}

const { location } = Astro.props;
---

<LocationTemplate data={location.data} />

For large datasets, Astro's build parallelism is faster than Next.js getStaticPaths when generating 10,000+ pages. Benchmark before committing to a framework at scale.

Quality scoring algorithm

Build a quality score before any page is published. The score gates what gets generated.

// lib/quality-score.ts

export interface PageQualityResult {
  score: number;       // 0-100
  grade: 'A' | 'B' | 'C' | 'F';
  passes: boolean;     // true if score >= threshold
  reasons: string[];   // why points were deducted
}

interface LocationPageData {
  providerCount: number;
  topProviders: unknown[];
  averagePrice: number;
  localStat: string;
  reviewCount: number;
  nearbyLocations: string[];
  uniqueBodyWords: number; // word count of unique (non-boilerplate) content
}

export function scoreLocationPage(
  data: LocationPageData,
  threshold = 65
): PageQualityResult {
  let score = 0;
  const reasons: string[] = [];

  // Provider data quality (30 points)
  if (data.providerCount >= 10) score += 30;
  else if (data.providerCount >= 5) score += 20;
  else { score += 0; reasons.push(`Low provider count: ${data.providerCount}`); }

  // Pricing data present (20 points)
  if (data.averagePrice > 0) score += 20;
  else reasons.push('No average price data');

  // Local stat present (15 points)
  if (data.localStat && data.localStat.length >= 30) score += 15;
  else reasons.push('Missing or thin local stat');

  // Review data (15 points)
  if (data.reviewCount >= 10) score += 15;
  else if (data.reviewCount >= 3) score += 8;
  else reasons.push(`Insufficient reviews: ${data.reviewCount}`);

  // Internal linking potential (10 points)
  if (data.nearbyLocations.length >= 4) score += 10;
  else reasons.push('Not enough nearby locations for internal links');

  // Unique content density (10 points)
  if (data.uniqueBodyWords >= 200) score += 10;
  else reasons.push(`Low unique word count: ${data.uniqueBodyWords}`);

  const passes = score >= threshold;
  const grade = score >= 85 ? 'A' : score >= 70 ? 'B' : score >= 55 ? 'C' : 'F';

  return { score, grade, passes, reasons };
}

Run this during generateStaticParams / getStaticPaths and log all failed pages to a file for review. Do not silently drop them - you want to know what data is missing.

// scripts/quality-audit.ts
import { loadAllLocationData } from '@/lib/data';
import { scoreLocationPage } from '@/lib/quality-score';
import fs from 'fs';

const all = await loadAllLocationData();
const failed = all
  .map((page) => ({ page, result: scoreLocationPage(page) }))
  .filter(({ result }) => !result.passes);

console.log(`Quality audit: ${failed.length}/${all.length} pages below threshold`);

fs.writeFileSync(
  'reports/quality-audit.json',
  JSON.stringify(failed, null, 2)
);

Batch publishing cadence

Publishing schedule to avoid triggering Google's quality review systems:

Phase	Volume	Cadence	Goal
Seed	50-100 pages	Day 1	Validate template in real SERP
Observe	0 new pages	Weeks 2-6	Check indexing, coverage, ranking
Ramp	100-200/day	Weeks 7-10	Scale while monitoring coverage
Cruise	500-1000/day	Week 11+	Full velocity once proven
Maintenance	On data update	Ongoing	Refresh stale pages

Publish via sitemap submission, not manual URL inspection. Submitting 10,000 URLs manually via Search Console wastes your inspection quota. Instead, submit the sitemap index and let Googlebot discover pages at its own crawl rate.

Avoid mass deletions. If a page gets deindexed, do not delete it - redirect it to the hub page. Mass 404s damage crawl budget trust. Only hard-delete pages that were never indexed.

Frequently Asked Questions

What is programmatic-seo?

How do I install programmatic-seo?

Run npx skills add AbsolutelySkilled/AbsolutelySkilled --skill programmatic-seo in your terminal. The skill will be immediately available in your AI coding agent.

What AI agents support programmatic-seo?

programmatic-seo works with claude-code, gemini-cli, openai-codex, mcp. Install it once and use it across any supported AI coding agent.