geo-optimization

Use this skill when optimizing for AI-powered search engines and generative search results - Google AI Overviews, ChatGPT Search (SearchGPT), Perplexity, Microsoft Copilot Search, and other LLM-powered answer engines. Covers Generative Engine Optimization (GEO), citation signals for AI search, entity authority, LLMs.txt specification, and LLM-friendliness patterns based on Princeton GEO research. Triggers on visibility in AI search, getting cited by LLMs, or adapting SEO for the AI search era.

What is geo-optimization?

Overview Files

geo-optimization

geo-optimization is a production-ready AI agent skill for claude-code, gemini-cli, openai-codex, and 1 more. Optimizing for AI-powered search engines and generative search results - Google AI Overviews, ChatGPT Search (SearchGPT), Perplexity, Microsoft Copilot Search, and other LLM-powered answer engines.

Quick Facts

Field	Value
Category	marketing
Version	0.1.0
Platforms	claude-code, gemini-cli, openai-codex, mcp
License	MIT

How to Install

Make sure you have Node.js installed on your machine.
Run the following command in your terminal:

npx skills add AbsolutelySkilled/AbsolutelySkilled --skill geo-optimization

The geo-optimization skill is now available in your AI coding agent (Claude Code, Gemini CLI, OpenAI Codex, etc.).

Overview

Generative Engine Optimization (GEO) is the emerging discipline of optimizing content so that AI-powered search engines cite it in their synthesized answers. Unlike traditional SEO - where success means ranking a blue link on page one - GEO success means getting your content quoted, paraphrased, or linked inside an AI-generated response from Google AI Overviews, ChatGPT Search, Perplexity, or Microsoft Copilot Search.

This field is nascent and evolving fast. The foundational research (notably Princeton's 2023 GEO paper) provides early empirical evidence, but best practices are still being discovered in the wild. Treat every strategy here as a working hypothesis subject to revision as AI search products mature, change their retrieval logic, and shift their citation behaviors.

Important: GEO supplements traditional SEO - it does not replace it. AI search engines primarily cite pages that already have domain authority and ranking signals. A strong traditional SEO foundation is a prerequisite, not an alternative.

Platforms

claude-code
gemini-cli
openai-codex
mcp

Related Skills

Pair geo-optimization with these complementary skills:

Frequently Asked Questions

What is geo-optimization?

How do I install geo-optimization?

Run npx skills add AbsolutelySkilled/AbsolutelySkilled --skill geo-optimization in your terminal. The skill will be immediately available in your AI coding agent.

What AI agents support geo-optimization?

This skill works with claude-code, gemini-cli, openai-codex, mcp. Install it once and use it across any supported AI coding agent.

Maintainers

@maddhruv

Generated from AbsolutelySkilled

SKILL.md

Generative Engine Optimization (GEO)

When to use this skill

Trigger this skill when the task involves:

Improving visibility in AI search results (Google AI Overviews, ChatGPT Search, Perplexity)
Getting cited by LLMs when users ask questions relevant to your domain
Auditing content for AI search citability
Implementing a /llms.txt file to make site content AI-readable
Optimizing entity presence so AI engines recognize your brand or product authoritatively
Structuring content for AI extraction (definitions, statistics, expert quotes)
Understanding why competitors appear in AI Overviews and you do not
Adapting an existing SEO content strategy for the generative search era

Do NOT trigger this skill for:

Traditional SERP ranking (blue-link SEO) - use a dedicated SEO skill
Technical crawlability issues (robots.txt, sitemaps, Core Web Vitals) - those are pre-requisites to GEO, not GEO itself

Key principles

Entity authority matters more than page authority in AI search. AI engines build knowledge graphs. Being recognized as an authoritative entity (brand, person, concept) across Wikipedia, Wikidata, structured data markup, and consistent web mentions increases citation probability more than raw domain authority alone.
Citability over clickability. Traditional SEO optimizes the title/meta for click-through. GEO optimizes the content body for AI extraction. Write content that can be quoted verbatim - specific, attributable, factually dense claims.
Statistics, data, and expert quotes increase citation probability. Princeton's GEO research found that adding authoritative statistics, citing sources within content, and including expert quotations improved AI citation rates by 30-40% in controlled experiments. Data-backed claims are preferred over opinion.
LLMs.txt makes your content explicitly available for AI consumption. The /llms.txt specification (inspired by robots.txt) provides a structured, curated entry point that AI crawlers can use to understand your site's content hierarchy without guessing.
GEO supplements traditional SEO, it does not replace it. AI Overviews pull from pages that already rank. Strong backlink profiles, E-E-A-T signals, and technical SEO hygiene remain foundational requirements.

Core concepts

How AI search engines work (Retrieval-Augmented Generation)

AI search engines use a Retrieval-Augmented Generation (RAG) architecture. When a user submits a query, the system: (1) retrieves candidate pages using a traditional search index, (2) extracts relevant passages from those pages, (3) passes those passages as context to a large language model, and (4) generates a synthesized answer with citations.

This means two things: your page must be indexable and retrievable (traditional SEO), AND the extracted passage must be clear, specific, and quotable enough for the LLM to use it (GEO).

The citation mechanism

When an AI engine cites a source, it has determined that a passage from that page best answers part of the query. Citation selection is influenced by:

Semantic relevance of the passage to the query
Source domain authority and trustworthiness signals
Content structure (well-delimited claims are easier to extract)
Presence of unique data or authoritative attribution

Entity recognition and knowledge graphs

AI engines maintain implicit knowledge graphs. When they process a query about "Stripe payments" they recognize Stripe as an entity with known attributes. If your content is consistently associated with an entity (through schema.org markup, Wikipedia mentions, and consistent naming across the web), the AI engine is more likely to trust and cite your content on topics related to that entity.

Princeton GEO research findings

The 2023 Princeton GEO paper tested nine optimization strategies on a benchmark of 10,000 queries across Bing, Google, and Perplexity. Key findings:

Adding authoritative statistics increased citation by ~40%
Citing reputable sources within content increased citation by ~30%
Using an authoritative/confident tone improved inclusion rates
Adding expert quotations improved results in informational content
Fluency improvements (fixing grammar/clarity) had modest but consistent gains
Simply adding more keywords did not significantly improve citation rates

AI Overviews vs traditional featured snippets

Google's featured snippets (position zero) are extracted verbatim from a single page. AI Overviews synthesize across multiple sources and rewrite the content. This means a single authoritative source can no longer monopolize a topic - GEO requires building authority across a content cluster, not just a single optimized page.

Common tasks

Audit content for AI search citability

Walk through each piece of content and check:

Claims specificity - Replace "our tool improves performance" with "our tool reduced average page load time by 340ms in A/B testing across 50,000 sessions."
Source attribution - Cite third-party studies, reports, or standards when making claims. "According to the 2024 State of DevOps Report..."
Structure clarity - Ensure definitions, how-tos, and comparisons are in clearly delimited sections with descriptive headings. AI extractors favor self-contained paragraphs that answer a question completely.
Entity consistency - Does your brand/product name appear consistently across the page, schema markup, and linked social/Wikipedia pages?

Scoring rubric (use as checklist):

Every major claim has a specific data point or source
Page has schema.org markup (Article, Organization, FAQPage, or HowTo)
At least one expert quote or attributed statement per major section
Headings are question-answering, not just topical ("How does X work?" not "About X")
Entity name consistent in content, title, schema, and URL

Add citation-boosting elements

Statistics pattern:

Before: "Many companies struggle with cloud costs."
After:  "According to Gartner's 2024 Cloud Report, 73% of enterprises exceeded their
         cloud budgets in the prior fiscal year."

Expert quote pattern:

Before: "Security is critical in modern APIs."
After:  "As OWASP notes in its API Security Top 10: 'Broken object-level authorization
         is the most commonly exploited API vulnerability, affecting an estimated 40% of
         production APIs.'"

Definition pattern (high citability):

[TERM] is [concise, complete definition]. [One-sentence elaboration with a specific
example or data point].

Definitions that are clear and complete in a single paragraph are extremely frequently cited verbatim by AI engines answering "what is X" queries.

Implement a LLMs.txt file

Create /llms.txt at your site root. This file signals to AI crawlers what your site contains and where to find authoritative content. See references/llms-txt-spec.md for the full specification.

Minimal working example:

# Acme Developer Docs

> API documentation for Acme's payment processing platform.

## Documentation

- [API Reference](https://docs.acme.com/api): Full REST API reference with all endpoints
- [Quickstart](https://docs.acme.com/quickstart): Get your first payment running in 5 minutes
- [Authentication](https://docs.acme.com/auth): API keys, OAuth 2.0, webhook signatures
- [SDKs](https://docs.acme.com/sdks): Official libraries for Node.js, Python, Ruby, Go

## About

- [Company](https://acme.com/about): About Acme and our mission
- [Blog](https://acme.com/blog): Engineering and product updates

Deploy at https://yourdomain.com/llms.txt. Ensure it is accessible to crawlers (not blocked by robots.txt).

Optimize entity presence

Entity authority is built through consistent signals across the web:

Wikipedia/Wikidata - Create or improve entries for your brand, product, or founders where notable. AI engines heavily weight Wikipedia as a trusted entity source.
Schema.org markup - Add Organization, Product, Person, or SoftwareApplication schema to relevant pages. This explicitly tells crawlers what entities exist on your site.
Consistent NAP - Name, Address, Phone (for local entities) must be identical across Google Business Profile, LinkedIn, Crunchbase, and your site.
Knowledge panel - If a Google Knowledge Panel exists for your entity, claim it and ensure the data is accurate. This feeds into AI Overview entity recognition.
Cross-domain mentions - Earn mentions and links from authoritative domains in your category. AI engines use co-citation patterns to build entity authority.

Structure content for AI extraction

AI extractors prefer content that is:

Self-contained: A single paragraph should fully answer the sub-question without requiring the reader to read the entire article for context.
Scannable with semantic headings: Use H2/H3 headings phrased as questions or clear topic labels. "How does caching work in Redis?" outperforms "Caching" as a heading.
Table-friendly for comparisons: Comparison data in tables (with clear column headers) is highly extractable. AI engines frequently synthesize comparison answers from tables.
FAQPage schema for Q&A content: If your page answers multiple distinct questions, add FAQPage schema markup. This gives the AI direct access to the Q/A pairs.

Monitor AI search visibility

The tooling ecosystem for GEO monitoring is immature as of early 2025. Available approaches:

Manual spot-checking (free, reliable):

Search your target queries in ChatGPT (web browsing mode), Perplexity, and Google (for AI Overviews) regularly
Note which competitors are cited and what passage is being pulled from their pages
Identify the content patterns those passages share

Emerging tools (validate independently - landscape is changing fast):

Semrush, Ahrefs, and BrightEdge are developing AI search visibility features
AI Rank trackers like Rankscale or similar tools may track AI citation presence
Manual Perplexity search with "sites:" filtering can help audit your domain's presence

Baseline tracking: Build a spreadsheet of 20-50 target queries. For each, record monthly whether your domain appears in AI Overviews, ChatGPT Search, and Perplexity results. Track the trend.

Adapt existing content strategy for GEO

For teams with established SEO content programs:

Prioritize data-rich content - Commission or publish original research, surveys, and benchmark reports. Original data is a citation magnet for AI engines.
Update thin content - Pages that rank but lack specific data are citation-invisible to AI. Audit top-ranking pages and add statistics, quotes, and definitions.
Build content clusters with entity focus - Rather than isolated posts, build clusters of 5-10 articles around a single entity or concept, with strong internal linking. AI engines recognize topical authority through cluster density.
Add author entity markup - If content is from a recognized expert, add author schema with sameAs links to their LinkedIn, Google Scholar, or Wikipedia. Author authority feeds into E-E-A-T signals that AI engines evaluate.

Anti-patterns

Anti-pattern	Why it fails
Optimizing only for AI search, ignoring traditional SEO	AI engines cite pages that already rank. Without indexing and authority, GEO efforts are invisible.
Blocking AI crawlers in robots.txt	Disallowing Googlebot, GPTBot, PerplexityBot, or ClaudeBot removes you from AI search entirely. Confirm which bots you are and aren't blocking.
Stuffing fake or unverifiable statistics	AI engines and human readers both lose trust. Fabricated data backfires badly if cited and then fact-checked.
Inconsistent entity naming	Referring to your product as "Acme", "Acme.io", and "The Acme Platform" in different places dilutes entity recognition. Pick one canonical name.
Treating GEO techniques as stable	The field is evolving month by month. What works today on Perplexity may not work on next year's Google AI Overviews. Revisit strategy quarterly.
One-page GEO fix ("just add llms.txt")	LLMs.txt alone does not create citations. It is one signal among many. Entity authority and content quality matter far more.
Assuming AI search replaces traditional search traffic	Most search volume still flows through traditional results. Zero-click AI answers may reduce some traffic; the net impact is still being measured.

Gotchas

Blocking AI crawlers in robots.txt removes you from AI search entirely - GPTBot, ClaudeBot, PerplexityBot, and Google-Extended are AI crawler user agents. A blanket User-agent: * Disallow: / or a past "block all bots" rule may be excluding all AI crawlers silently. Audit robots.txt before any GEO effort - being unindexable is the failure mode that makes all other GEO work irrelevant.
Adding statistics without sources backfires worse than having none - AI engines cross-reference claims against their training data. Fabricated or unsourced statistics that conflict with known data cause the content to be scored as low-trust and excluded from citations. Every data point must link to a verifiable primary source (report, study, official dataset).
LLMs.txt helps AI crawlers but doesn't help if the pages aren't indexed - LLMs.txt is a navigation aid, not a crawling permission grant. If the pages it points to are blocked by robots.txt, return errors, or are not indexed in Google, AI systems that use traditional search indices to retrieve content will never see them. Fix indexing and crawlability first.
Entity inconsistency across brand touchpoints dilutes knowledge graph recognition - If your product is called "Acme" on your website, "Acme.io" in press mentions, and "The Acme Platform" in your schema markup, AI engines build three weak entity nodes instead of one strong one. Standardize the canonical brand name across every mention, schema field, and social profile before building entity authority.
GEO strategies are engine-specific and change frequently - What increases citation probability on Perplexity today may not affect Google AI Overviews, and both may change their retrieval behavior within months. Never apply GEO tactics without specifying which engine you are targeting, and revisit your strategy at least quarterly.

References

Load these files when going deeper on specific topics:

references/ai-search-engines.md - How each AI search engine works (Google AI Overviews, ChatGPT Search, Perplexity, Copilot Search), citation patterns, and what increases inclusion probability per engine. Load when engine-specific strategy is needed.
references/citation-signals.md - Princeton GEO research findings in detail, full list of citation-boosting signals, entity authority factors, structured data impact. Load when auditing content or building a GEO optimization checklist.
references/llms-txt-spec.md - Full LLMs.txt specification: format, syntax, what to include, relationship to robots.txt, llms-full.txt variant, adoption status, and example implementations. Load when implementing or advising on LLMs.txt.

References

ai-search-engines.md

AI Search Engines - GEO Reference

How each major AI search engine works, what triggers citations, and what content signals increase the probability of being included. Note that all AI search products are actively evolving - retrieval logic, citation formats, and ranking factors change with model updates. Treat everything here as current as of early 2025.

Google AI Overviews

What it is

Google AI Overviews (formerly Search Generative Experience / SGE) is Google's AI-generated answer panel that appears above organic results for qualifying queries. It synthesizes a multi-paragraph response by pulling from multiple pages and cites sources inline with expandable links.

What triggers AI Overviews

AI Overviews appear on:

Informational and research queries ("how does X work", "what is the best Y")
Multi-faceted questions that traditionally required clicking multiple results
Comparison queries ("X vs Y", "pros and cons of Z")
How-to and step-by-step queries

AI Overviews generally do NOT appear on:

Navigational queries (brand name lookups, "login to Gmail")
Local queries (Google favors Maps/Local Pack)
Recent news (real-time freshness requirement that AI Overviews cannot meet)
YMYL (Your Money Your Life) queries where accuracy risk is high - Google is conservative

How citations are selected

Google's AI Overviews pull from pages that already rank in the top ~20 organic results. If your page doesn't rank on page 1-2 for a query, it is very unlikely to be cited in the AI Overview for that query. This is the single most important implication for GEO: AI Overviews are a layer on top of traditional ranking, not a replacement for it.

Within eligible pages, Google's extraction logic favors:

Self-contained passages that directly answer the query question
Content with clear, unambiguous factual claims
Structured content (numbered lists, tables, definition blocks)
Pages with strong E-E-A-T signals (experience, expertise, authoritativeness, trust)
Schema.org markup that explicitly defines content type (HowTo, FAQPage, Article)

What increases citation probability on Google

Rank first - Appear in organic results for the target query (prerequisite)
FAQPage schema - Explicitly structured Q&A markup directly feeds AI Overview extraction
HowTo schema - Step-by-step content with HowTo markup is frequently cited in procedural AI Overviews
Authoritative statistics - Specific, sourced data points are frequently lifted verbatim
Clear topic headings - H2/H3 headers phrased as questions match query intent directly
E-E-A-T signals - Author bios with credentials, publication dates, sources cited in the body of the content

Known limitations and behaviors

Google does not guarantee citation even if you rank #1 - it picks the best passage, not necessarily from the top result
AI Overview citations can change day to day as Google updates the underlying model
Google has been observed pulling from pages that rank position 10-20 if those pages have a more precisely matching passage than the top result
Content blocked from Googlebot indexing will not appear in AI Overviews

ChatGPT Search (SearchGPT)

What it is

ChatGPT Search (previously called SearchGPT, now integrated into ChatGPT's browsing mode) is OpenAI's real-time web search feature. When a user's query requires current information or web sources, ChatGPT triggers a web search, retrieves pages, and synthesizes a response with cited sources shown in a sidebar.

How content gets discovered

ChatGPT Search uses Microsoft Bing's index as its primary search backend (as of 2024). This means:

Pages that rank in Bing have significantly higher chances of being retrieved
Bing Webmaster Tools submission is important for ChatGPT Search visibility
GPTBot (OpenAI's crawler) also separately crawls content for model training and potentially for search retrieval - do not block it unless you have a specific reason

Citation format

ChatGPT Search shows inline citation numbers [1] in the response text, with source cards in a right-panel sidebar showing title, domain, and snippet. Users can expand individual sources to read more. This means your page title and meta description matter for click-through even within AI search - users see them in the citation card.

What increases citation probability

Bing ranking - Strong Bing SEO is the primary lever. Many sites over-index on Google and under-optimize for Bing. Submit to Bing Webmaster Tools, ensure Bingbot is not blocked.
Content freshness - ChatGPT Search has a recency bias for news and time-sensitive topics. Updated publication dates and fresh content signal relevance.
Clear, factual prose - ChatGPT's retrieval favors content that can be summarized accurately. Ambiguous or heavily opinionated content is cited less.
Structured answers - Well-organized content with clear answer sections is pulled more reliably than content buried in long narratives.
Domain trustworthiness - News sites, official documentation, and established domains are cited more frequently than new or thin-authority sites.

GPTBot crawler

OpenAI's crawler is GPTBot. Allow it unless you have a specific reason not to:

# robots.txt - allow GPTBot
User-agent: GPTBot
Allow: /

Blocking GPTBot may reduce inclusion in ChatGPT Search retrieval. If you want to allow crawling but opt out of training data use, OpenAI provides a separate mechanism via their data opt-out process (distinct from the robots.txt signal).

Perplexity

What it is

Perplexity is a standalone AI search engine that positions itself as a "answer engine" rather than a traditional search engine. Every query gets an AI-synthesized response with multiple cited sources displayed in a prominent sources panel. Perplexity has strong adoption among researchers, developers, and technical users.

How content gets discovered

Perplexity uses a combination of:

Its own PerplexityBot crawler (indexes content directly)
Bing's index as a fallback for content not yet in its own index
Real-time search for very recent queries

PerplexityBot crawls aggressively. Ensure it is not blocked:

User-agent: PerplexityBot
Allow: /

Sources panel behavior

Perplexity shows 4-6 sources in a right-column panel for each response. These sources are visible and clickable, making Perplexity one of the better AI search engines for referral traffic. Unlike Google AI Overviews (which can reduce click-through), Perplexity citations actively drive visits.

Pro Search vs standard search

Perplexity Pro Search (available to Pro subscribers) does multi-step research - it generates sub-queries, retrieves multiple rounds of sources, and synthesizes a more comprehensive answer. Content that appears in standard searches will also appear in Pro searches, but Pro Search sometimes retrieves deeper technical content that standard search misses.

What increases citation probability

Technical and research-heavy content - Perplexity's user base skews technical. Content with depth, specificity, and citations is disproportionately favored.
Primary sources and original research - Perplexity frequently cites academic papers, official documentation, and original data publications.
Content freshness - Perplexity Pro Search heavily weights freshness for time-sensitive queries. Keep publication and modification dates accurate.
Well-structured content - Perplexity's extraction logic handles markdown-style content well. Clear headings, bullet points, and numbered lists extract cleanly.
PerplexityBot access - Not blocking PerplexityBot is table stakes. Giving it access to all content (not just crawl-allowed pages) increases coverage.

Microsoft Copilot Search

What it is

Microsoft Copilot integrates AI search directly into Bing and the Windows operating system. It shares the Bing index and uses GPT-4-class models (via Microsoft's OpenAI partnership) to generate answers. Copilot appears in Bing's search results, the Microsoft Edge sidebar, Windows search, and Microsoft 365 applications.

How content gets discovered

Copilot Search runs entirely on Bing's index. This means:

All Bing SEO factors apply directly (Bing Webmaster Tools, Bingbot allowance)
Bing's trust signals differ slightly from Google's - Bing places more weight on social signals (LinkedIn, Twitter/X engagement) and less on raw backlink count
MicrosoftBot (Bing's crawler) must not be blocked

User-agent: msnbot
Allow: /

User-agent: bingbot
Allow: /

Citation format

Copilot Search shows a generated response at the top of Bing results with citations embedded inline, similar to Google AI Overviews but with Bing's look and feel. Source links are shown below the generated response.

What increases citation probability

Bing ranking - Primary factor, same as for ChatGPT Search
Social proof signals - Bing weights LinkedIn presence and social sharing signals. Content that earns social engagement indexes well on Bing.
Structured data - Bing fully supports schema.org markup and uses it for Copilot citation selection.
HTTPS and security - Bing has historically penalized non-HTTPS or mixed-content pages more aggressively than Google.

Cross-engine GEO checklist

Apply these to maximize visibility across all four engines:

Signal	Google AI Overviews	ChatGPT Search	Perplexity	Copilot Search
Traditional organic ranking	Critical (Google)	Important (Bing)	Moderate	Critical (Bing)
Allow engine crawler	Googlebot	GPTBot + Bingbot	PerplexityBot	Bingbot + msnbot
Schema.org markup	High impact	Moderate	Low direct impact	Moderate
Content freshness	Moderate	High for news	High	High
Statistical claims	High	High	High	High
E-E-A-T / author authority	High	Moderate	Moderate	Moderate
LLMs.txt	Emerging	Emerging	Emerging	Emerging

citation-signals.md

Citation Signals - GEO Reference

Detailed coverage of what makes content more likely to be cited by AI search engines, grounded in the Princeton GEO research and extended with observed best practices.

Princeton GEO Research (2023)

Overview

The foundational academic paper on GEO is "GEO: Generative Engine Optimization" by Aggarwal et al. from Princeton University (arXiv:2311.09735, 2023). The paper introduced the term "GEO" and ran controlled experiments measuring how different optimization strategies affected the fraction of AI-generated responses that included content from a given source.

Benchmark methodology

The researchers built a benchmark called GEO-bench with 10,000 queries across multiple domains (healthcare, law, technology, finance, travel) and tested responses from Bing, Google (SGE), and Perplexity. For each query they measured whether a given source was included in the AI-generated response (citation presence) and if so, how prominently.

The "GEO score" they defined = (weighted sum of source citations across a response) / (total response length), measuring both presence and prominence.

Optimization strategies tested and results

The paper tested nine distinct optimization strategies:

Strategy	Description	Observed Impact
Authoritative statistics	Add specific numeric statistics with source attribution	~40% improvement in citation rate
Citing sources	Include references to external authoritative sources within the content	~30% improvement
Quotations	Add direct quotes from recognized experts or authoritative bodies	Positive impact, especially informational queries
Easy-to-understand language	Simplify and improve readability/clarity	Small but consistent improvement
Fluency optimization	Fix grammar, sentence structure, overall prose quality	Small consistent improvement
Adding unique/technical terms	Use domain-specific terminology correctly	Mixed results - better in technical domains
Authoritative tone	Write with confident, authoritative voice vs hedged/tentative	Positive impact across domains
Keyword stuffing	Add high-frequency query keywords repeatedly	Minimal or negative impact
Citing your own sources	Self-reference your own content	Minimal impact, sometimes negative

Key takeaways from the research

Qualitative signals beat keyword signals. The strategies that worked were about content quality (statistics, sources, quotes, clarity), not keyword manipulation. This aligns with how RAG-based systems work - they care about passage quality, not keyword density.
Domain matters. The effectiveness of each strategy varied by domain. Technical and scientific domains benefited more from statistics and citations. Humanities and lifestyle domains benefited more from fluency and clarity improvements.
The effect is additive. Applying multiple strategies compounds the improvement. A page that adds statistics AND cites external sources AND includes expert quotes outperforms a page that does only one of those things.
Prominence matters, not just presence. Being mentioned briefly at the end of an AI response is less valuable than being the primary cited source at the start. Content that answers the core of the query (not just peripheral aspects) earns more prominent placement.

Core citation signals

1. Specific, attributable statistics

The single strongest individual signal found in research and observed in practice. Statistics must be:

Specific: "73% of enterprises exceeded cloud budgets" not "many companies overspend"
Attributed: "According to Gartner's 2024 Cloud Report" or "per the 2023 Stack Overflow Developer Survey"
Relevant: Directly supports the claim being made, not tangential

Example transformation:

Weak:  "API security is an important concern for modern applications."
Strong: "The OWASP API Security Top 10 reports that broken object-level authorization
         affects an estimated 40% of production APIs, making it the leading API
         vulnerability class."

Original data as a citation superpower: Publishing your own survey, benchmark, or dataset creates statistics that only you can provide. AI engines must cite you if they want to reference that data. Annual reports, developer surveys, and industry benchmarks are high-citation-value content investments.

2. External source citations in content

Including citations to authoritative third-party sources within your content signals to AI engines that your content has been researched and verified. Effective citation patterns:

Link to primary sources (original research, official standards, authoritative reports) not secondary summaries
Use academic/journalistic citation style: "A 2023 study published in Nature found..."
Cite the specific section or finding, not just the source

Note: this is citing external sources within your content to boost your own citability - not the same as building backlinks.

3. Expert quotations

Direct quotes from recognized experts, industry organizations, or authoritative bodies are frequently lifted verbatim by AI engines. To maximize effectiveness:

Attribute clearly: "Dr. Jane Smith, Professor of Computer Science at MIT, explains:"
Use quotation marks precisely around the quoted text
Choose quotes that directly answer common questions in your domain
Include the quote source context (interview, publication, talk)

For non-original content (curating others' quotes): Always link to the original source.

4. Content clarity and readability

AI extraction is easier when content is:

Written in plain, direct sentences (Flesch-Kincaid grade 8-12 for most topics)
Free of grammatical errors and run-on sentences
Structured so each paragraph addresses one idea
Avoiding excessive hedging ("might possibly", "could perhaps") - authoritative tone signals confident expertise

5. Schema.org structured data

Schema markup provides machine-readable signals about content structure and entity relationships. High-impact schema types for GEO:

FAQPage:

{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [{
    "@type": "Question",
    "name": "What is Generative Engine Optimization?",
    "acceptedAnswer": {
      "@type": "Answer",
      "text": "Generative Engine Optimization (GEO) is the practice of optimizing..."
    }
  }]
}

FAQPage schema gives AI engines direct access to Q&A pairs without requiring extraction. This is arguably the highest-leverage structured data type for GEO.

HowTo: Numbered step instructions with HowTo schema are frequently cited in procedural AI responses. Include name, text, and optionally image for each step.

Article / NewsArticle: Establishes content type, author, datePublished, and publisher entity. Important for E-E-A-T signal and for establishing freshness.

Organization / SoftwareApplication: Defines your brand/product as a recognized entity with sameAs links to authoritative external profiles (Wikipedia, LinkedIn, Crunchbase, GitHub).

Entity authority signals

What entity authority means

In AI search, an "entity" is a distinct, named thing: a company, product, person, technology, or concept. AI engines maintain implicit entity graphs where certain entities are more trusted and recognized than others. High entity authority means the AI engine "knows" your entity and associates it with your domain of expertise.

Building entity authority

Wikipedia and Wikidata presence: Wikipedia is the most heavily weighted entity source for most AI systems. A well-sourced Wikipedia article about your company, product, or founders significantly boosts entity recognition. Wikidata entries (the structured data layer under Wikipedia) are directly ingested by many AI knowledge graph systems.

Criteria for Wikipedia notability: typically requires significant coverage in independent, reliable sources. If your entity is notable by Wikipedia standards, having a presence there is very high GEO value. Do not create or edit Wikipedia articles for promotional purposes - it backfires and violates Wikipedia policy.

Knowledge Panel maintenance: Google's Knowledge Panel pulls from Wikipedia, Wikidata, and other structured sources. Verify your entity's panel (if one exists) for accuracy. Claim it through Google Search Console to submit corrections.

Consistent entity naming: Choose a canonical entity name and use it consistently everywhere:

Website content, title tags, and schema markup
Social media profiles (LinkedIn, Twitter/X)
External business directories (Crunchbase, Bloomberg, G2)
Press releases and earned media

Inconsistency (sometimes "Acme Inc.", sometimes "Acme", sometimes "acme.io") fragments entity association and reduces authority.

sameAs links in schema: Use sameAs in Organization or Person schema to explicitly link your entity across different platforms:

{
  "@type": "Organization",
  "name": "Acme Inc.",
  "url": "https://acme.com",
  "sameAs": [
    "https://en.wikipedia.org/wiki/Acme_Inc",
    "https://www.linkedin.com/company/acme-inc",
    "https://www.crunchbase.com/organization/acme-inc",
    "https://twitter.com/acme"
  ]
}

Content freshness signals

AI engines favor current information, especially for fast-moving domains. Freshness signals:

Accurate datePublished and dateModified in Article schema - Do not manipulate these dates; AI engines and search crawlers notice inconsistencies between stated and observed modification dates.
Annual or periodic content refresh - Review top-ranking content annually to update statistics, add new sections, and remove outdated information.
Publication date visible in content - "Last updated March 2025" in the page body reinforces freshness signals for AI extractors.
New content velocity - Sites that publish regularly signal active maintenance and authority maintenance. Stale content inventories (no new content in 6+ months) negatively impact entity authority over time.

Domain reputation signals

These are table-stakes prerequisites - without them, content-level GEO efforts are less effective:

Domain age and backlink profile: Established domains with earned backlinks from authoritative sites are trusted more heavily by all AI engines.
HTTPS and security: Mandatory. Non-HTTPS domains face ranking penalties that reduce eligibility for AI citations.
Core Web Vitals: Heavily loaded, slow pages may be crawled less completely, reducing passage extraction quality.
Thin content audit: Pages with fewer than 400-500 words of substantive content rarely get cited. Consolidate thin pages or expand them with specific, useful content.
Spam and duplicate content: Duplicate content confuses entity association. Canonical tags must be correctly set.

Quick GEO optimization checklist

For auditing a single piece of content:

At least one specific, attributed statistic in each major section
External authoritative sources cited inline (not just in a reference list)
One or more expert quotes with full attribution
Headings phrased as questions or clear topic statements
Each core paragraph is self-contained and answers one question fully
FAQPage or HowTo schema applied where appropriate
Article schema with accurate datePublished and author entity
Organization schema with sameAs links on key landing pages
Entity name consistent throughout page content and schema
Page ranks in top 20 for target query (prerequisite for AI Overview citation)
Engine crawlers (Googlebot, GPTBot, PerplexityBot, Bingbot) not blocked

llms-txt-spec.md

LLMs.txt Specification - GEO Reference

Complete reference for the /llms.txt specification: what it is, the format, how to implement it, and its relationship to the broader GEO ecosystem.

What is LLMs.txt?

LLMs.txt is a proposed standard for a plain-text file placed at the root of a website (/llms.txt) that provides a curated, AI-readable summary of the site's content structure. It is conceptually similar to robots.txt (which tells crawlers what to access) but serves a different purpose: rather than controlling access, it guides AI systems toward the most relevant and authoritative content on the site.

The specification was proposed by Jeremy Howard (fast.ai) in September 2024. As of early 2025, it is not an official W3C or IETF standard - it is a community proposal with growing adoption, especially among developer-facing documentation sites.

LLMs.txt vs robots.txt comparison:

	robots.txt	llms.txt
Purpose	Control crawler access	Guide AI systems to relevant content
Format	Custom key-value syntax	Markdown
Enforcement	Honored by well-behaved crawlers	Advisory only - no enforcement mechanism
Location	`/robots.txt`	`/llms.txt`
Standard body	Informally standardized	Community proposal (no formal body)
Required?	De facto required	Optional, growing adoption

File format

LLMs.txt uses Markdown syntax. The structure is:

# Site or Project Name

> One-sentence tagline or description of what this site/project is.

Optional introductory paragraph providing context that is useful for an AI system
trying to understand this site's purpose, audience, and content scope.

## Section Name

- [Page Title](https://example.com/page): Brief description of what this page contains
- [Another Page](https://example.com/other): Brief description

## Another Section

- [API Reference](https://docs.example.com/api): Complete REST API documentation

Format rules

H1 (#): The site or project name. Required. Appears once at the top.
Blockquote (>): A single-sentence description. Required. Immediately follows H1.
Introductory paragraph: Optional. Provides context. Comes after the blockquote.
H2 (##) sections: Logical groupings of content. Use descriptive labels like "Documentation", "Guides", "API Reference", "About", "Blog".
List items: Each item is a Markdown link with an optional inline description after a colon. The description should be 1-2 sentences explaining what the page contains, not just restating the title.

Naming guidance for sections

Good section names:

"Documentation" - core product/service documentation
"Getting Started" - onboarding and quickstart content
"API Reference" - technical API documentation
"Guides" - how-to and tutorial content
"About" - company/project information
"Blog" - articles, engineering posts, release notes
"Changelog" - version history

Avoid vague section names like "Resources" or "Links" - they provide no signal to AI systems about what the content is.

llms-full.txt variant

The specification also defines an optional /llms-full.txt file. Where llms.txt contains a curated index of links, llms-full.txt contains the full text content of the site's most important pages, concatenated.

The purpose of llms-full.txt is to allow AI systems (especially those with large context windows) to ingest a complete snapshot of a site's content in a single request, rather than following individual links.

When to use llms-full.txt:

Documentation sites where AI tools (like IDE assistants) need offline/cached access
Sites targeting AI coding tools that context-load documentation before answering questions
Smaller sites where full content fits in a reasonable context window (< 100K tokens)

Format: Plain text or Markdown, sections separated by ---, each section starting with the source URL as a reference comment:

<!-- Source: https://docs.example.com/quickstart -->
# Quickstart Guide

[Full content of the quickstart page here]

---

<!-- Source: https://docs.example.com/api/auth -->
# Authentication

[Full content of the auth page here]

Practical limit: llms-full.txt files larger than ~1-2MB become impractical for most AI consumption scenarios. Prioritize the most commonly needed content rather than attempting to include everything.

Relationship to robots.txt

LLMs.txt and robots.txt serve complementary roles:

robots.txt controls which crawlers can access which paths. If you block GPTBot in robots.txt, those pages won't appear in ChatGPT Search regardless of what llms.txt says.
llms.txt guides AI systems that have already been granted access (i.e., not blocked by robots.txt) toward the most relevant content.

Common configuration mistake: Adding an llms.txt file while simultaneously blocking AI crawlers in robots.txt. The llms.txt will be ignored because the crawlers can't reach it or won't trust the guidance from a site that blocks them.

Recommended robots.txt for GEO-friendly configuration:

User-agent: *
Allow: /

User-agent: Googlebot
Allow: /

User-agent: GPTBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: msnbot
Allow: /

Only add Disallow rules for paths that should genuinely not be indexed (admin panels, authenticated content, private APIs, etc.).

Implementation guide

Step 1 - Identify your key content

Before writing llms.txt, inventory your site's most important pages by category:

Core documentation or product pages (highest priority)
Getting started / quickstart content (high priority - most commonly queried)
API reference or technical specifications (high priority for technical audiences)
Blog posts and guides (medium priority - selective, pick evergreen content)
About / company pages (lower priority - include briefly)

Aim for 15-40 links total. Too few provides insufficient guidance; too many dilutes the signal and starts to look like a sitemap rather than a curated index.

Step 2 - Write descriptions

For each link, write a description that tells an AI system what the page answers, not just what topic it covers. Compare:

# Weak descriptions
- [Authentication](https://docs.example.com/auth): Authentication information
- [API Limits](https://docs.example.com/limits): Rate limits

# Strong descriptions
- [Authentication](https://docs.example.com/auth): How to generate API keys,
  implement OAuth 2.0 flows, and verify webhook signatures. Includes code
  examples for Node.js and Python.
- [API Limits](https://docs.example.com/limits): Rate limit tiers by plan,
  429 error handling, and best practices for burst traffic management.

Step 3 - Deploy the file

Place llms.txt at the root of your web server so it is accessible at https://yourdomain.com/llms.txt. Ensure:

Content-type is text/plain or text/markdown
File is accessible without authentication
File is NOT listed in robots.txt Disallow rules
File is served over HTTPS

For static sites, most frameworks support this by placing the file in the public/ or static/ directory. For dynamic sites, add a route that serves the file.

Step 4 - Add llms-full.txt (optional)

If your site is documentation-heavy and you want to support AI tools that load content rather than follow links, generate llms-full.txt by concatenating the Markdown source of your most important pages. Many documentation frameworks (Docusaurus, GitBook, Mintlify) have plugins or scripts that can generate this automatically.

Full example implementations

Developer documentation site (e.g., API product)

# Acme API Documentation

> RESTful payment processing API for e-commerce and SaaS platforms.

Acme's API enables developers to accept payments, manage subscriptions, and
handle payouts. This documentation covers authentication, all API endpoints,
error handling, and SDK usage.

## Getting Started

- [Quickstart](https://docs.acme.com/quickstart): Process your first payment
  in under 5 minutes. Covers API key setup, a test charge, and response handling.
- [Core Concepts](https://docs.acme.com/concepts): Mental model for Acme's data
  objects (charges, customers, subscriptions, payment methods) and how they relate.

## Authentication

- [API Keys](https://docs.acme.com/auth/api-keys): Creating and rotating API keys,
  test vs live mode, key scoping for least-privilege access.
- [OAuth 2.0](https://docs.acme.com/auth/oauth): Implementing OAuth for third-party
  integrations. Authorization code flow, refresh tokens, and scopes reference.
- [Webhook Signatures](https://docs.acme.com/auth/webhooks): Verifying webhook
  payloads using HMAC-SHA256. Required for production webhook security.

## API Reference

- [Charges](https://docs.acme.com/api/charges): Create, capture, and refund charges.
  Full parameter reference and response schema.
- [Customers](https://docs.acme.com/api/customers): Customer object CRUD operations,
  payment method attachment, and customer portal configuration.
- [Subscriptions](https://docs.acme.com/api/subscriptions): Subscription lifecycle,
  proration, trial periods, and cancellation handling.
- [Webhooks](https://docs.acme.com/api/webhooks): Event types reference, payload
  schemas, retry behavior, and testing with the CLI.

## SDKs

- [Node.js SDK](https://docs.acme.com/sdk/node): Installation, initialization,
  and complete method reference for the official Node.js library.
- [Python SDK](https://docs.acme.com/sdk/python): Installation and method reference
  for the official Python library.

## About

- [Company](https://acme.com/about): Acme Inc. is a YC-backed payments infrastructure
  company founded in 2021, processing $2B+ annually.
- [Status](https://status.acme.com): Real-time API uptime and incident history.

Content/Marketing site

# Acme Marketing Blog

> Data-driven marketing strategies and case studies for B2B SaaS companies.

## Popular Guides

- [Email Deliverability Guide](https://acme.com/guides/email-deliverability): Complete
  guide to improving inbox placement rates, covering SPF/DKIM/DMARC setup,
  list hygiene, and warm-up strategies.
- [SaaS Pricing Playbook](https://acme.com/guides/saas-pricing): Framework for
  value-based pricing, packaging strategies, and pricing page optimization with
  A/B test data from 50+ SaaS companies.

## Research & Data

- [2024 B2B Marketing Report](https://acme.com/research/2024-b2b-marketing): Annual
  survey of 500 B2B marketing leaders on budget allocation, channel performance,
  and AI tool adoption.

## About

- [About Acme](https://acme.com/about): Marketing analytics platform helping
  B2B SaaS companies attribute revenue to content and campaigns.

Adoption status and AI engine support

As of early 2025:

Anthropic (Claude): Has stated support for the llms.txt concept and crawls these files. Claude's web browsing uses llms.txt as a navigation signal.
OpenAI (ChatGPT): No official statement, but GPTBot crawls and likely processes the files as structured text.
Perplexity: PerplexityBot crawls llms.txt files and the team has indicated awareness of the spec.
Google: No official statement as of early 2025. Googlebot indexes the file as a regular crawled page, but specific llms.txt-aware behavior in AI Overviews has not been confirmed.
Developer tools (GitHub Copilot, Cursor, Codeium): Several AI coding tools have implemented llms.txt support for loading documentation context.

Spec home: The canonical spec and a registry of sites with llms.txt are maintained at https://llmstxt.org (community site, not a formal standards body).

Frequently Asked Questions

What is geo-optimization?

How do I install geo-optimization?

Run npx skills add AbsolutelySkilled/AbsolutelySkilled --skill geo-optimization in your terminal. The skill will be immediately available in your AI coding agent.

What AI agents support geo-optimization?

geo-optimization works with claude-code, gemini-cli, openai-codex, mcp. Install it once and use it across any supported AI coding agent.

geo-optimization

What is geo-optimization?

geo-optimization

Quick Facts

How to Install

Overview

Tags

Platforms

Related Skills

Frequently Asked Questions

What is geo-optimization?

How do I install geo-optimization?

What AI agents support geo-optimization?

Maintainers

SKILL.md

Generative Engine Optimization (GEO)

When to use this skill

Key principles

Core concepts

How AI search engines work (Retrieval-Augmented Generation)

The citation mechanism

Entity recognition and knowledge graphs

Princeton GEO research findings

AI Overviews vs traditional featured snippets

Common tasks

Audit content for AI search citability

Add citation-boosting elements

Implement a LLMs.txt file

Optimize entity presence

Structure content for AI extraction

Monitor AI search visibility

Adapt existing content strategy for GEO

Anti-patterns

Gotchas

References

References

ai-search-engines.md

AI Search Engines - GEO Reference

Google AI Overviews

What it is

What triggers AI Overviews

How citations are selected

What increases citation probability on Google

Known limitations and behaviors

ChatGPT Search (SearchGPT)

What it is

How content gets discovered

Citation format

What increases citation probability

GPTBot crawler

Perplexity

What it is

How content gets discovered

Sources panel behavior

Pro Search vs standard search

What increases citation probability

Microsoft Copilot Search

What it is

How content gets discovered

Citation format

What increases citation probability

Cross-engine GEO checklist

citation-signals.md

Citation Signals - GEO Reference

Princeton GEO Research (2023)

Overview

Benchmark methodology

Optimization strategies tested and results

Key takeaways from the research

Core citation signals

1. Specific, attributable statistics

2. External source citations in content

3. Expert quotations

4. Content clarity and readability

5. Schema.org structured data

Entity authority signals

What entity authority means

Building entity authority

Content freshness signals

Domain reputation signals