geo-optimization
Use this skill when optimizing for AI-powered search engines and generative search results - Google AI Overviews, ChatGPT Search (SearchGPT), Perplexity, Microsoft Copilot Search, and other LLM-powered answer engines. Covers Generative Engine Optimization (GEO), citation signals for AI search, entity authority, LLMs.txt specification, and LLM-friendliness patterns based on Princeton GEO research. Triggers on visibility in AI search, getting cited by LLMs, or adapting SEO for the AI search era.
marketing seogeogenerative-searchai-overviewschatgpt-searchperplexityllms-txtWhat is geo-optimization?
Use this skill when optimizing for AI-powered search engines and generative search results - Google AI Overviews, ChatGPT Search (SearchGPT), Perplexity, Microsoft Copilot Search, and other LLM-powered answer engines. Covers Generative Engine Optimization (GEO), citation signals for AI search, entity authority, LLMs.txt specification, and LLM-friendliness patterns based on Princeton GEO research. Triggers on visibility in AI search, getting cited by LLMs, or adapting SEO for the AI search era.
geo-optimization
geo-optimization is a production-ready AI agent skill for claude-code, gemini-cli, openai-codex, and 1 more. Optimizing for AI-powered search engines and generative search results - Google AI Overviews, ChatGPT Search (SearchGPT), Perplexity, Microsoft Copilot Search, and other LLM-powered answer engines.
Quick Facts
| Field | Value |
|---|---|
| Category | marketing |
| Version | 0.1.0 |
| Platforms | claude-code, gemini-cli, openai-codex, mcp |
| License | MIT |
How to Install
- Make sure you have Node.js installed on your machine.
- Run the following command in your terminal:
npx skills add AbsolutelySkilled/AbsolutelySkilled --skill geo-optimization- The geo-optimization skill is now available in your AI coding agent (Claude Code, Gemini CLI, OpenAI Codex, etc.).
Overview
Generative Engine Optimization (GEO) is the emerging discipline of optimizing content so that AI-powered search engines cite it in their synthesized answers. Unlike traditional SEO - where success means ranking a blue link on page one - GEO success means getting your content quoted, paraphrased, or linked inside an AI-generated response from Google AI Overviews, ChatGPT Search, Perplexity, or Microsoft Copilot Search.
This field is nascent and evolving fast. The foundational research (notably Princeton's 2023 GEO paper) provides early empirical evidence, but best practices are still being discovered in the wild. Treat every strategy here as a working hypothesis subject to revision as AI search products mature, change their retrieval logic, and shift their citation behaviors.
Important: GEO supplements traditional SEO - it does not replace it. AI search engines primarily cite pages that already have domain authority and ranking signals. A strong traditional SEO foundation is a prerequisite, not an alternative.
Tags
seo geo generative-search ai-overviews chatgpt-search perplexity llms-txt
Platforms
- claude-code
- gemini-cli
- openai-codex
- mcp
Related Skills
Pair geo-optimization with these complementary skills:
Frequently Asked Questions
What is geo-optimization?
Use this skill when optimizing for AI-powered search engines and generative search results - Google AI Overviews, ChatGPT Search (SearchGPT), Perplexity, Microsoft Copilot Search, and other LLM-powered answer engines. Covers Generative Engine Optimization (GEO), citation signals for AI search, entity authority, LLMs.txt specification, and LLM-friendliness patterns based on Princeton GEO research. Triggers on visibility in AI search, getting cited by LLMs, or adapting SEO for the AI search era.
How do I install geo-optimization?
Run npx skills add AbsolutelySkilled/AbsolutelySkilled --skill geo-optimization in your terminal. The skill will be immediately available in your AI coding agent.
What AI agents support geo-optimization?
This skill works with claude-code, gemini-cli, openai-codex, mcp. Install it once and use it across any supported AI coding agent.
Maintainers
Generated from AbsolutelySkilled
SKILL.md
Generative Engine Optimization (GEO)
Generative Engine Optimization (GEO) is the emerging discipline of optimizing content so that AI-powered search engines cite it in their synthesized answers. Unlike traditional SEO - where success means ranking a blue link on page one - GEO success means getting your content quoted, paraphrased, or linked inside an AI-generated response from Google AI Overviews, ChatGPT Search, Perplexity, or Microsoft Copilot Search.
This field is nascent and evolving fast. The foundational research (notably Princeton's 2023 GEO paper) provides early empirical evidence, but best practices are still being discovered in the wild. Treat every strategy here as a working hypothesis subject to revision as AI search products mature, change their retrieval logic, and shift their citation behaviors.
Important: GEO supplements traditional SEO - it does not replace it. AI search engines primarily cite pages that already have domain authority and ranking signals. A strong traditional SEO foundation is a prerequisite, not an alternative.
When to use this skill
Trigger this skill when the task involves:
- Improving visibility in AI search results (Google AI Overviews, ChatGPT Search, Perplexity)
- Getting cited by LLMs when users ask questions relevant to your domain
- Auditing content for AI search citability
- Implementing a
/llms.txtfile to make site content AI-readable - Optimizing entity presence so AI engines recognize your brand or product authoritatively
- Structuring content for AI extraction (definitions, statistics, expert quotes)
- Understanding why competitors appear in AI Overviews and you do not
- Adapting an existing SEO content strategy for the generative search era
Do NOT trigger this skill for:
- Traditional SERP ranking (blue-link SEO) - use a dedicated SEO skill
- Technical crawlability issues (robots.txt, sitemaps, Core Web Vitals) - those are pre-requisites to GEO, not GEO itself
Key principles
Entity authority matters more than page authority in AI search. AI engines build knowledge graphs. Being recognized as an authoritative entity (brand, person, concept) across Wikipedia, Wikidata, structured data markup, and consistent web mentions increases citation probability more than raw domain authority alone.
Citability over clickability. Traditional SEO optimizes the title/meta for click-through. GEO optimizes the content body for AI extraction. Write content that can be quoted verbatim - specific, attributable, factually dense claims.
Statistics, data, and expert quotes increase citation probability. Princeton's GEO research found that adding authoritative statistics, citing sources within content, and including expert quotations improved AI citation rates by 30-40% in controlled experiments. Data-backed claims are preferred over opinion.
LLMs.txt makes your content explicitly available for AI consumption. The
/llms.txtspecification (inspired byrobots.txt) provides a structured, curated entry point that AI crawlers can use to understand your site's content hierarchy without guessing.GEO supplements traditional SEO, it does not replace it. AI Overviews pull from pages that already rank. Strong backlink profiles, E-E-A-T signals, and technical SEO hygiene remain foundational requirements.
Core concepts
How AI search engines work (Retrieval-Augmented Generation)
AI search engines use a Retrieval-Augmented Generation (RAG) architecture. When a user submits a query, the system: (1) retrieves candidate pages using a traditional search index, (2) extracts relevant passages from those pages, (3) passes those passages as context to a large language model, and (4) generates a synthesized answer with citations.
This means two things: your page must be indexable and retrievable (traditional SEO), AND the extracted passage must be clear, specific, and quotable enough for the LLM to use it (GEO).
The citation mechanism
When an AI engine cites a source, it has determined that a passage from that page best answers part of the query. Citation selection is influenced by:
- Semantic relevance of the passage to the query
- Source domain authority and trustworthiness signals
- Content structure (well-delimited claims are easier to extract)
- Presence of unique data or authoritative attribution
Entity recognition and knowledge graphs
AI engines maintain implicit knowledge graphs. When they process a query about "Stripe payments" they recognize Stripe as an entity with known attributes. If your content is consistently associated with an entity (through schema.org markup, Wikipedia mentions, and consistent naming across the web), the AI engine is more likely to trust and cite your content on topics related to that entity.
Princeton GEO research findings
The 2023 Princeton GEO paper tested nine optimization strategies on a benchmark of 10,000 queries across Bing, Google, and Perplexity. Key findings:
- Adding authoritative statistics increased citation by ~40%
- Citing reputable sources within content increased citation by ~30%
- Using an authoritative/confident tone improved inclusion rates
- Adding expert quotations improved results in informational content
- Fluency improvements (fixing grammar/clarity) had modest but consistent gains
- Simply adding more keywords did not significantly improve citation rates
AI Overviews vs traditional featured snippets
Google's featured snippets (position zero) are extracted verbatim from a single page. AI Overviews synthesize across multiple sources and rewrite the content. This means a single authoritative source can no longer monopolize a topic - GEO requires building authority across a content cluster, not just a single optimized page.
Common tasks
Audit content for AI search citability
Walk through each piece of content and check:
- Claims specificity - Replace "our tool improves performance" with "our tool reduced average page load time by 340ms in A/B testing across 50,000 sessions."
- Source attribution - Cite third-party studies, reports, or standards when making claims. "According to the 2024 State of DevOps Report..."
- Structure clarity - Ensure definitions, how-tos, and comparisons are in clearly delimited sections with descriptive headings. AI extractors favor self-contained paragraphs that answer a question completely.
- Entity consistency - Does your brand/product name appear consistently across the page, schema markup, and linked social/Wikipedia pages?
Scoring rubric (use as checklist):
- Every major claim has a specific data point or source
- Page has schema.org markup (Article, Organization, FAQPage, or HowTo)
- At least one expert quote or attributed statement per major section
- Headings are question-answering, not just topical ("How does X work?" not "About X")
- Entity name consistent in content, title, schema, and URL
Add citation-boosting elements
Statistics pattern:
Before: "Many companies struggle with cloud costs."
After: "According to Gartner's 2024 Cloud Report, 73% of enterprises exceeded their
cloud budgets in the prior fiscal year."Expert quote pattern:
Before: "Security is critical in modern APIs."
After: "As OWASP notes in its API Security Top 10: 'Broken object-level authorization
is the most commonly exploited API vulnerability, affecting an estimated 40% of
production APIs.'"Definition pattern (high citability):
[TERM] is [concise, complete definition]. [One-sentence elaboration with a specific
example or data point].Definitions that are clear and complete in a single paragraph are extremely frequently cited verbatim by AI engines answering "what is X" queries.
Implement a LLMs.txt file
Create /llms.txt at your site root. This file signals to AI crawlers what your site
contains and where to find authoritative content. See references/llms-txt-spec.md
for the full specification.
Minimal working example:
# Acme Developer Docs
> API documentation for Acme's payment processing platform.
## Documentation
- [API Reference](https://docs.acme.com/api): Full REST API reference with all endpoints
- [Quickstart](https://docs.acme.com/quickstart): Get your first payment running in 5 minutes
- [Authentication](https://docs.acme.com/auth): API keys, OAuth 2.0, webhook signatures
- [SDKs](https://docs.acme.com/sdks): Official libraries for Node.js, Python, Ruby, Go
## About
- [Company](https://acme.com/about): About Acme and our mission
- [Blog](https://acme.com/blog): Engineering and product updatesDeploy at https://yourdomain.com/llms.txt. Ensure it is accessible to crawlers (not
blocked by robots.txt).
Optimize entity presence
Entity authority is built through consistent signals across the web:
- Wikipedia/Wikidata - Create or improve entries for your brand, product, or founders where notable. AI engines heavily weight Wikipedia as a trusted entity source.
- Schema.org markup - Add
Organization,Product,Person, orSoftwareApplicationschema to relevant pages. This explicitly tells crawlers what entities exist on your site. - Consistent NAP - Name, Address, Phone (for local entities) must be identical across Google Business Profile, LinkedIn, Crunchbase, and your site.
- Knowledge panel - If a Google Knowledge Panel exists for your entity, claim it and ensure the data is accurate. This feeds into AI Overview entity recognition.
- Cross-domain mentions - Earn mentions and links from authoritative domains in your category. AI engines use co-citation patterns to build entity authority.
Structure content for AI extraction
AI extractors prefer content that is:
- Self-contained: A single paragraph should fully answer the sub-question without requiring the reader to read the entire article for context.
- Scannable with semantic headings: Use H2/H3 headings phrased as questions or clear topic labels. "How does caching work in Redis?" outperforms "Caching" as a heading.
- Table-friendly for comparisons: Comparison data in tables (with clear column headers) is highly extractable. AI engines frequently synthesize comparison answers from tables.
- FAQPage schema for Q&A content: If your page answers multiple distinct questions, add FAQPage schema markup. This gives the AI direct access to the Q/A pairs.
Monitor AI search visibility
The tooling ecosystem for GEO monitoring is immature as of early 2025. Available approaches:
Manual spot-checking (free, reliable):
- Search your target queries in ChatGPT (web browsing mode), Perplexity, and Google (for AI Overviews) regularly
- Note which competitors are cited and what passage is being pulled from their pages
- Identify the content patterns those passages share
Emerging tools (validate independently - landscape is changing fast):
- Semrush, Ahrefs, and BrightEdge are developing AI search visibility features
- AI Rank trackers like Rankscale or similar tools may track AI citation presence
- Manual Perplexity search with "sites:" filtering can help audit your domain's presence
Baseline tracking: Build a spreadsheet of 20-50 target queries. For each, record monthly whether your domain appears in AI Overviews, ChatGPT Search, and Perplexity results. Track the trend.
Adapt existing content strategy for GEO
For teams with established SEO content programs:
- Prioritize data-rich content - Commission or publish original research, surveys, and benchmark reports. Original data is a citation magnet for AI engines.
- Update thin content - Pages that rank but lack specific data are citation-invisible to AI. Audit top-ranking pages and add statistics, quotes, and definitions.
- Build content clusters with entity focus - Rather than isolated posts, build clusters of 5-10 articles around a single entity or concept, with strong internal linking. AI engines recognize topical authority through cluster density.
- Add author entity markup - If content is from a recognized expert, add
authorschema withsameAslinks to their LinkedIn, Google Scholar, or Wikipedia. Author authority feeds into E-E-A-T signals that AI engines evaluate.
Anti-patterns
| Anti-pattern | Why it fails |
|---|---|
| Optimizing only for AI search, ignoring traditional SEO | AI engines cite pages that already rank. Without indexing and authority, GEO efforts are invisible. |
| Blocking AI crawlers in robots.txt | Disallowing Googlebot, GPTBot, PerplexityBot, or ClaudeBot removes you from AI search entirely. Confirm which bots you are and aren't blocking. |
| Stuffing fake or unverifiable statistics | AI engines and human readers both lose trust. Fabricated data backfires badly if cited and then fact-checked. |
| Inconsistent entity naming | Referring to your product as "Acme", "Acme.io", and "The Acme Platform" in different places dilutes entity recognition. Pick one canonical name. |
| Treating GEO techniques as stable | The field is evolving month by month. What works today on Perplexity may not work on next year's Google AI Overviews. Revisit strategy quarterly. |
| One-page GEO fix ("just add llms.txt") | LLMs.txt alone does not create citations. It is one signal among many. Entity authority and content quality matter far more. |
| Assuming AI search replaces traditional search traffic | Most search volume still flows through traditional results. Zero-click AI answers may reduce some traffic; the net impact is still being measured. |
Gotchas
Blocking AI crawlers in robots.txt removes you from AI search entirely - GPTBot, ClaudeBot, PerplexityBot, and Google-Extended are AI crawler user agents. A blanket
User-agent: * Disallow: /or a past "block all bots" rule may be excluding all AI crawlers silently. Auditrobots.txtbefore any GEO effort - being unindexable is the failure mode that makes all other GEO work irrelevant.Adding statistics without sources backfires worse than having none - AI engines cross-reference claims against their training data. Fabricated or unsourced statistics that conflict with known data cause the content to be scored as low-trust and excluded from citations. Every data point must link to a verifiable primary source (report, study, official dataset).
LLMs.txt helps AI crawlers but doesn't help if the pages aren't indexed - LLMs.txt is a navigation aid, not a crawling permission grant. If the pages it points to are blocked by
robots.txt, return errors, or are not indexed in Google, AI systems that use traditional search indices to retrieve content will never see them. Fix indexing and crawlability first.Entity inconsistency across brand touchpoints dilutes knowledge graph recognition - If your product is called "Acme" on your website, "Acme.io" in press mentions, and "The Acme Platform" in your schema markup, AI engines build three weak entity nodes instead of one strong one. Standardize the canonical brand name across every mention, schema field, and social profile before building entity authority.
GEO strategies are engine-specific and change frequently - What increases citation probability on Perplexity today may not affect Google AI Overviews, and both may change their retrieval behavior within months. Never apply GEO tactics without specifying which engine you are targeting, and revisit your strategy at least quarterly.
References
Load these files when going deeper on specific topics:
references/ai-search-engines.md- How each AI search engine works (Google AI Overviews, ChatGPT Search, Perplexity, Copilot Search), citation patterns, and what increases inclusion probability per engine. Load when engine-specific strategy is needed.references/citation-signals.md- Princeton GEO research findings in detail, full list of citation-boosting signals, entity authority factors, structured data impact. Load when auditing content or building a GEO optimization checklist.references/llms-txt-spec.md- Full LLMs.txt specification: format, syntax, what to include, relationship to robots.txt,llms-full.txtvariant, adoption status, and example implementations. Load when implementing or advising on LLMs.txt.
References
ai-search-engines.md
AI Search Engines - GEO Reference
How each major AI search engine works, what triggers citations, and what content signals increase the probability of being included. Note that all AI search products are actively evolving - retrieval logic, citation formats, and ranking factors change with model updates. Treat everything here as current as of early 2025.
Google AI Overviews
What it is
Google AI Overviews (formerly Search Generative Experience / SGE) is Google's AI-generated answer panel that appears above organic results for qualifying queries. It synthesizes a multi-paragraph response by pulling from multiple pages and cites sources inline with expandable links.
What triggers AI Overviews
AI Overviews appear on:
- Informational and research queries ("how does X work", "what is the best Y")
- Multi-faceted questions that traditionally required clicking multiple results
- Comparison queries ("X vs Y", "pros and cons of Z")
- How-to and step-by-step queries
AI Overviews generally do NOT appear on:
- Navigational queries (brand name lookups, "login to Gmail")
- Local queries (Google favors Maps/Local Pack)
- Recent news (real-time freshness requirement that AI Overviews cannot meet)
- YMYL (Your Money Your Life) queries where accuracy risk is high - Google is conservative
How citations are selected
Google's AI Overviews pull from pages that already rank in the top ~20 organic results. If your page doesn't rank on page 1-2 for a query, it is very unlikely to be cited in the AI Overview for that query. This is the single most important implication for GEO: AI Overviews are a layer on top of traditional ranking, not a replacement for it.
Within eligible pages, Google's extraction logic favors:
- Self-contained passages that directly answer the query question
- Content with clear, unambiguous factual claims
- Structured content (numbered lists, tables, definition blocks)
- Pages with strong E-E-A-T signals (experience, expertise, authoritativeness, trust)
- Schema.org markup that explicitly defines content type (HowTo, FAQPage, Article)
What increases citation probability on Google
- Rank first - Appear in organic results for the target query (prerequisite)
- FAQPage schema - Explicitly structured Q&A markup directly feeds AI Overview extraction
- HowTo schema - Step-by-step content with HowTo markup is frequently cited in procedural AI Overviews
- Authoritative statistics - Specific, sourced data points are frequently lifted verbatim
- Clear topic headings - H2/H3 headers phrased as questions match query intent directly
- E-E-A-T signals - Author bios with credentials, publication dates, sources cited in the body of the content
Known limitations and behaviors
- Google does not guarantee citation even if you rank #1 - it picks the best passage, not necessarily from the top result
- AI Overview citations can change day to day as Google updates the underlying model
- Google has been observed pulling from pages that rank position 10-20 if those pages have a more precisely matching passage than the top result
- Content blocked from Googlebot indexing will not appear in AI Overviews
ChatGPT Search (SearchGPT)
What it is
ChatGPT Search (previously called SearchGPT, now integrated into ChatGPT's browsing mode) is OpenAI's real-time web search feature. When a user's query requires current information or web sources, ChatGPT triggers a web search, retrieves pages, and synthesizes a response with cited sources shown in a sidebar.
How content gets discovered
ChatGPT Search uses Microsoft Bing's index as its primary search backend (as of 2024). This means:
- Pages that rank in Bing have significantly higher chances of being retrieved
- Bing Webmaster Tools submission is important for ChatGPT Search visibility
- GPTBot (OpenAI's crawler) also separately crawls content for model training and potentially for search retrieval - do not block it unless you have a specific reason
Citation format
ChatGPT Search shows inline citation numbers [1] in the response text, with source cards in a right-panel sidebar showing title, domain, and snippet. Users can expand individual sources to read more. This means your page title and meta description matter for click-through even within AI search - users see them in the citation card.
What increases citation probability
- Bing ranking - Strong Bing SEO is the primary lever. Many sites over-index on Google and under-optimize for Bing. Submit to Bing Webmaster Tools, ensure Bingbot is not blocked.
- Content freshness - ChatGPT Search has a recency bias for news and time-sensitive topics. Updated publication dates and fresh content signal relevance.
- Clear, factual prose - ChatGPT's retrieval favors content that can be summarized accurately. Ambiguous or heavily opinionated content is cited less.
- Structured answers - Well-organized content with clear answer sections is pulled more reliably than content buried in long narratives.
- Domain trustworthiness - News sites, official documentation, and established domains are cited more frequently than new or thin-authority sites.
GPTBot crawler
OpenAI's crawler is GPTBot. Allow it unless you have a specific reason not to:
# robots.txt - allow GPTBot
User-agent: GPTBot
Allow: /Blocking GPTBot may reduce inclusion in ChatGPT Search retrieval. If you want to allow crawling but opt out of training data use, OpenAI provides a separate mechanism via their data opt-out process (distinct from the robots.txt signal).
Perplexity
What it is
Perplexity is a standalone AI search engine that positions itself as a "answer engine" rather than a traditional search engine. Every query gets an AI-synthesized response with multiple cited sources displayed in a prominent sources panel. Perplexity has strong adoption among researchers, developers, and technical users.
How content gets discovered
Perplexity uses a combination of:
- Its own PerplexityBot crawler (indexes content directly)
- Bing's index as a fallback for content not yet in its own index
- Real-time search for very recent queries
PerplexityBot crawls aggressively. Ensure it is not blocked:
User-agent: PerplexityBot
Allow: /Sources panel behavior
Perplexity shows 4-6 sources in a right-column panel for each response. These sources are visible and clickable, making Perplexity one of the better AI search engines for referral traffic. Unlike Google AI Overviews (which can reduce click-through), Perplexity citations actively drive visits.
Pro Search vs standard search
Perplexity Pro Search (available to Pro subscribers) does multi-step research - it generates sub-queries, retrieves multiple rounds of sources, and synthesizes a more comprehensive answer. Content that appears in standard searches will also appear in Pro searches, but Pro Search sometimes retrieves deeper technical content that standard search misses.
What increases citation probability
- Technical and research-heavy content - Perplexity's user base skews technical. Content with depth, specificity, and citations is disproportionately favored.
- Primary sources and original research - Perplexity frequently cites academic papers, official documentation, and original data publications.
- Content freshness - Perplexity Pro Search heavily weights freshness for time-sensitive queries. Keep publication and modification dates accurate.
- Well-structured content - Perplexity's extraction logic handles markdown-style content well. Clear headings, bullet points, and numbered lists extract cleanly.
- PerplexityBot access - Not blocking PerplexityBot is table stakes. Giving it access to all content (not just crawl-allowed pages) increases coverage.
Microsoft Copilot Search
What it is
Microsoft Copilot integrates AI search directly into Bing and the Windows operating system. It shares the Bing index and uses GPT-4-class models (via Microsoft's OpenAI partnership) to generate answers. Copilot appears in Bing's search results, the Microsoft Edge sidebar, Windows search, and Microsoft 365 applications.
How content gets discovered
Copilot Search runs entirely on Bing's index. This means:
- All Bing SEO factors apply directly (Bing Webmaster Tools, Bingbot allowance)
- Bing's trust signals differ slightly from Google's - Bing places more weight on social signals (LinkedIn, Twitter/X engagement) and less on raw backlink count
- MicrosoftBot (Bing's crawler) must not be blocked
User-agent: msnbot
Allow: /
User-agent: bingbot
Allow: /Citation format
Copilot Search shows a generated response at the top of Bing results with citations embedded inline, similar to Google AI Overviews but with Bing's look and feel. Source links are shown below the generated response.
What increases citation probability
- Bing ranking - Primary factor, same as for ChatGPT Search
- Social proof signals - Bing weights LinkedIn presence and social sharing signals. Content that earns social engagement indexes well on Bing.
- Structured data - Bing fully supports schema.org markup and uses it for Copilot citation selection.
- HTTPS and security - Bing has historically penalized non-HTTPS or mixed-content pages more aggressively than Google.
Cross-engine GEO checklist
Apply these to maximize visibility across all four engines:
| Signal | Google AI Overviews | ChatGPT Search | Perplexity | Copilot Search |
|---|---|---|---|---|
| Traditional organic ranking | Critical (Google) | Important (Bing) | Moderate | Critical (Bing) |
| Allow engine crawler | Googlebot | GPTBot + Bingbot | PerplexityBot | Bingbot + msnbot |
| Schema.org markup | High impact | Moderate | Low direct impact | Moderate |
| Content freshness | Moderate | High for news | High | High |
| Statistical claims | High | High | High | High |
| E-E-A-T / author authority | High | Moderate | Moderate | Moderate |
| LLMs.txt | Emerging | Emerging | Emerging | Emerging |
citation-signals.md
Citation Signals - GEO Reference
Detailed coverage of what makes content more likely to be cited by AI search engines, grounded in the Princeton GEO research and extended with observed best practices.
Princeton GEO Research (2023)
Overview
The foundational academic paper on GEO is "GEO: Generative Engine Optimization" by Aggarwal et al. from Princeton University (arXiv:2311.09735, 2023). The paper introduced the term "GEO" and ran controlled experiments measuring how different optimization strategies affected the fraction of AI-generated responses that included content from a given source.
Benchmark methodology
The researchers built a benchmark called GEO-bench with 10,000 queries across multiple domains (healthcare, law, technology, finance, travel) and tested responses from Bing, Google (SGE), and Perplexity. For each query they measured whether a given source was included in the AI-generated response (citation presence) and if so, how prominently.
The "GEO score" they defined = (weighted sum of source citations across a response) / (total response length), measuring both presence and prominence.
Optimization strategies tested and results
The paper tested nine distinct optimization strategies:
| Strategy | Description | Observed Impact |
|---|---|---|
| Authoritative statistics | Add specific numeric statistics with source attribution | ~40% improvement in citation rate |
| Citing sources | Include references to external authoritative sources within the content | ~30% improvement |
| Quotations | Add direct quotes from recognized experts or authoritative bodies | Positive impact, especially informational queries |
| Easy-to-understand language | Simplify and improve readability/clarity | Small but consistent improvement |
| Fluency optimization | Fix grammar, sentence structure, overall prose quality | Small consistent improvement |
| Adding unique/technical terms | Use domain-specific terminology correctly | Mixed results - better in technical domains |
| Authoritative tone | Write with confident, authoritative voice vs hedged/tentative | Positive impact across domains |
| Keyword stuffing | Add high-frequency query keywords repeatedly | Minimal or negative impact |
| Citing your own sources | Self-reference your own content | Minimal impact, sometimes negative |
Key takeaways from the research
Qualitative signals beat keyword signals. The strategies that worked were about content quality (statistics, sources, quotes, clarity), not keyword manipulation. This aligns with how RAG-based systems work - they care about passage quality, not keyword density.
Domain matters. The effectiveness of each strategy varied by domain. Technical and scientific domains benefited more from statistics and citations. Humanities and lifestyle domains benefited more from fluency and clarity improvements.
The effect is additive. Applying multiple strategies compounds the improvement. A page that adds statistics AND cites external sources AND includes expert quotes outperforms a page that does only one of those things.
Prominence matters, not just presence. Being mentioned briefly at the end of an AI response is less valuable than being the primary cited source at the start. Content that answers the core of the query (not just peripheral aspects) earns more prominent placement.
Core citation signals
1. Specific, attributable statistics
The single strongest individual signal found in research and observed in practice. Statistics must be:
- Specific: "73% of enterprises exceeded cloud budgets" not "many companies overspend"
- Attributed: "According to Gartner's 2024 Cloud Report" or "per the 2023 Stack Overflow Developer Survey"
- Relevant: Directly supports the claim being made, not tangential
Example transformation:
Weak: "API security is an important concern for modern applications."
Strong: "The OWASP API Security Top 10 reports that broken object-level authorization
affects an estimated 40% of production APIs, making it the leading API
vulnerability class."Original data as a citation superpower: Publishing your own survey, benchmark, or dataset creates statistics that only you can provide. AI engines must cite you if they want to reference that data. Annual reports, developer surveys, and industry benchmarks are high-citation-value content investments.
2. External source citations in content
Including citations to authoritative third-party sources within your content signals to AI engines that your content has been researched and verified. Effective citation patterns:
- Link to primary sources (original research, official standards, authoritative reports) not secondary summaries
- Use academic/journalistic citation style: "A 2023 study published in Nature found..."
- Cite the specific section or finding, not just the source
Note: this is citing external sources within your content to boost your own citability - not the same as building backlinks.
3. Expert quotations
Direct quotes from recognized experts, industry organizations, or authoritative bodies are frequently lifted verbatim by AI engines. To maximize effectiveness:
- Attribute clearly: "Dr. Jane Smith, Professor of Computer Science at MIT, explains:"
- Use quotation marks precisely around the quoted text
- Choose quotes that directly answer common questions in your domain
- Include the quote source context (interview, publication, talk)
For non-original content (curating others' quotes): Always link to the original source.
4. Content clarity and readability
AI extraction is easier when content is:
- Written in plain, direct sentences (Flesch-Kincaid grade 8-12 for most topics)
- Free of grammatical errors and run-on sentences
- Structured so each paragraph addresses one idea
- Avoiding excessive hedging ("might possibly", "could perhaps") - authoritative tone signals confident expertise
5. Schema.org structured data
Schema markup provides machine-readable signals about content structure and entity relationships. High-impact schema types for GEO:
FAQPage:
{
"@context": "https://schema.org",
"@type": "FAQPage",
"mainEntity": [{
"@type": "Question",
"name": "What is Generative Engine Optimization?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Generative Engine Optimization (GEO) is the practice of optimizing..."
}
}]
}FAQPage schema gives AI engines direct access to Q&A pairs without requiring extraction. This is arguably the highest-leverage structured data type for GEO.
HowTo:
Numbered step instructions with HowTo schema are frequently cited in procedural AI
responses. Include name, text, and optionally image for each step.
Article / NewsArticle: Establishes content type, author, datePublished, and publisher entity. Important for E-E-A-T signal and for establishing freshness.
Organization / SoftwareApplication:
Defines your brand/product as a recognized entity with sameAs links to authoritative
external profiles (Wikipedia, LinkedIn, Crunchbase, GitHub).
Entity authority signals
What entity authority means
In AI search, an "entity" is a distinct, named thing: a company, product, person, technology, or concept. AI engines maintain implicit entity graphs where certain entities are more trusted and recognized than others. High entity authority means the AI engine "knows" your entity and associates it with your domain of expertise.
Building entity authority
Wikipedia and Wikidata presence: Wikipedia is the most heavily weighted entity source for most AI systems. A well-sourced Wikipedia article about your company, product, or founders significantly boosts entity recognition. Wikidata entries (the structured data layer under Wikipedia) are directly ingested by many AI knowledge graph systems.
Criteria for Wikipedia notability: typically requires significant coverage in independent, reliable sources. If your entity is notable by Wikipedia standards, having a presence there is very high GEO value. Do not create or edit Wikipedia articles for promotional purposes - it backfires and violates Wikipedia policy.
Knowledge Panel maintenance: Google's Knowledge Panel pulls from Wikipedia, Wikidata, and other structured sources. Verify your entity's panel (if one exists) for accuracy. Claim it through Google Search Console to submit corrections.
Consistent entity naming: Choose a canonical entity name and use it consistently everywhere:
- Website content, title tags, and schema markup
- Social media profiles (LinkedIn, Twitter/X)
- External business directories (Crunchbase, Bloomberg, G2)
- Press releases and earned media
Inconsistency (sometimes "Acme Inc.", sometimes "Acme", sometimes "acme.io") fragments entity association and reduces authority.
sameAs links in schema:
Use sameAs in Organization or Person schema to explicitly link your entity across
different platforms:
{
"@type": "Organization",
"name": "Acme Inc.",
"url": "https://acme.com",
"sameAs": [
"https://en.wikipedia.org/wiki/Acme_Inc",
"https://www.linkedin.com/company/acme-inc",
"https://www.crunchbase.com/organization/acme-inc",
"https://twitter.com/acme"
]
}Content freshness signals
AI engines favor current information, especially for fast-moving domains. Freshness signals:
- Accurate datePublished and dateModified in Article schema - Do not manipulate these dates; AI engines and search crawlers notice inconsistencies between stated and observed modification dates.
- Annual or periodic content refresh - Review top-ranking content annually to update statistics, add new sections, and remove outdated information.
- Publication date visible in content - "Last updated March 2025" in the page body reinforces freshness signals for AI extractors.
- New content velocity - Sites that publish regularly signal active maintenance and authority maintenance. Stale content inventories (no new content in 6+ months) negatively impact entity authority over time.
Domain reputation signals
These are table-stakes prerequisites - without them, content-level GEO efforts are less effective:
- Domain age and backlink profile: Established domains with earned backlinks from authoritative sites are trusted more heavily by all AI engines.
- HTTPS and security: Mandatory. Non-HTTPS domains face ranking penalties that reduce eligibility for AI citations.
- Core Web Vitals: Heavily loaded, slow pages may be crawled less completely, reducing passage extraction quality.
- Thin content audit: Pages with fewer than 400-500 words of substantive content rarely get cited. Consolidate thin pages or expand them with specific, useful content.
- Spam and duplicate content: Duplicate content confuses entity association. Canonical tags must be correctly set.
Quick GEO optimization checklist
For auditing a single piece of content:
- At least one specific, attributed statistic in each major section
- External authoritative sources cited inline (not just in a reference list)
- One or more expert quotes with full attribution
- Headings phrased as questions or clear topic statements
- Each core paragraph is self-contained and answers one question fully
- FAQPage or HowTo schema applied where appropriate
- Article schema with accurate datePublished and author entity
- Organization schema with sameAs links on key landing pages
- Entity name consistent throughout page content and schema
- Page ranks in top 20 for target query (prerequisite for AI Overview citation)
- Engine crawlers (Googlebot, GPTBot, PerplexityBot, Bingbot) not blocked
llms-txt-spec.md
LLMs.txt Specification - GEO Reference
Complete reference for the /llms.txt specification: what it is, the format, how
to implement it, and its relationship to the broader GEO ecosystem.
What is LLMs.txt?
LLMs.txt is a proposed standard for a plain-text file placed at the root of a website
(/llms.txt) that provides a curated, AI-readable summary of the site's content
structure. It is conceptually similar to robots.txt (which tells crawlers what to
access) but serves a different purpose: rather than controlling access, it guides AI
systems toward the most relevant and authoritative content on the site.
The specification was proposed by Jeremy Howard (fast.ai) in September 2024. As of early 2025, it is not an official W3C or IETF standard - it is a community proposal with growing adoption, especially among developer-facing documentation sites.
LLMs.txt vs robots.txt comparison:
| robots.txt | llms.txt | |
|---|---|---|
| Purpose | Control crawler access | Guide AI systems to relevant content |
| Format | Custom key-value syntax | Markdown |
| Enforcement | Honored by well-behaved crawlers | Advisory only - no enforcement mechanism |
| Location | /robots.txt |
/llms.txt |
| Standard body | Informally standardized | Community proposal (no formal body) |
| Required? | De facto required | Optional, growing adoption |
File format
LLMs.txt uses Markdown syntax. The structure is:
# Site or Project Name
> One-sentence tagline or description of what this site/project is.
Optional introductory paragraph providing context that is useful for an AI system
trying to understand this site's purpose, audience, and content scope.
## Section Name
- [Page Title](https://example.com/page): Brief description of what this page contains
- [Another Page](https://example.com/other): Brief description
## Another Section
- [API Reference](https://docs.example.com/api): Complete REST API documentationFormat rules
- H1 (
#): The site or project name. Required. Appears once at the top. - Blockquote (
>): A single-sentence description. Required. Immediately follows H1. - Introductory paragraph: Optional. Provides context. Comes after the blockquote.
- H2 (
##) sections: Logical groupings of content. Use descriptive labels like "Documentation", "Guides", "API Reference", "About", "Blog". - List items: Each item is a Markdown link with an optional inline description after a colon. The description should be 1-2 sentences explaining what the page contains, not just restating the title.
Naming guidance for sections
Good section names:
- "Documentation" - core product/service documentation
- "Getting Started" - onboarding and quickstart content
- "API Reference" - technical API documentation
- "Guides" - how-to and tutorial content
- "About" - company/project information
- "Blog" - articles, engineering posts, release notes
- "Changelog" - version history
Avoid vague section names like "Resources" or "Links" - they provide no signal to AI systems about what the content is.
llms-full.txt variant
The specification also defines an optional /llms-full.txt file. Where llms.txt
contains a curated index of links, llms-full.txt contains the full text content
of the site's most important pages, concatenated.
The purpose of llms-full.txt is to allow AI systems (especially those with large
context windows) to ingest a complete snapshot of a site's content in a single request,
rather than following individual links.
When to use llms-full.txt:
- Documentation sites where AI tools (like IDE assistants) need offline/cached access
- Sites targeting AI coding tools that context-load documentation before answering questions
- Smaller sites where full content fits in a reasonable context window (< 100K tokens)
Format: Plain text or Markdown, sections separated by ---, each section starting
with the source URL as a reference comment:
<!-- Source: https://docs.example.com/quickstart -->
# Quickstart Guide
[Full content of the quickstart page here]
---
<!-- Source: https://docs.example.com/api/auth -->
# Authentication
[Full content of the auth page here]Practical limit: llms-full.txt files larger than ~1-2MB become impractical for
most AI consumption scenarios. Prioritize the most commonly needed content rather than
attempting to include everything.
Relationship to robots.txt
LLMs.txt and robots.txt serve complementary roles:
robots.txtcontrols which crawlers can access which paths. If you blockGPTBotinrobots.txt, those pages won't appear in ChatGPT Search regardless of whatllms.txtsays.llms.txtguides AI systems that have already been granted access (i.e., not blocked byrobots.txt) toward the most relevant content.
Common configuration mistake: Adding an llms.txt file while simultaneously
blocking AI crawlers in robots.txt. The llms.txt will be ignored because the
crawlers can't reach it or won't trust the guidance from a site that blocks them.
Recommended robots.txt for GEO-friendly configuration:
User-agent: *
Allow: /
User-agent: Googlebot
Allow: /
User-agent: GPTBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: msnbot
Allow: /Only add Disallow rules for paths that should genuinely not be indexed (admin panels,
authenticated content, private APIs, etc.).
Implementation guide
Step 1 - Identify your key content
Before writing llms.txt, inventory your site's most important pages by category:
- Core documentation or product pages (highest priority)
- Getting started / quickstart content (high priority - most commonly queried)
- API reference or technical specifications (high priority for technical audiences)
- Blog posts and guides (medium priority - selective, pick evergreen content)
- About / company pages (lower priority - include briefly)
Aim for 15-40 links total. Too few provides insufficient guidance; too many dilutes the signal and starts to look like a sitemap rather than a curated index.
Step 2 - Write descriptions
For each link, write a description that tells an AI system what the page answers, not just what topic it covers. Compare:
# Weak descriptions
- [Authentication](https://docs.example.com/auth): Authentication information
- [API Limits](https://docs.example.com/limits): Rate limits
# Strong descriptions
- [Authentication](https://docs.example.com/auth): How to generate API keys,
implement OAuth 2.0 flows, and verify webhook signatures. Includes code
examples for Node.js and Python.
- [API Limits](https://docs.example.com/limits): Rate limit tiers by plan,
429 error handling, and best practices for burst traffic management.Step 3 - Deploy the file
Place llms.txt at the root of your web server so it is accessible at
https://yourdomain.com/llms.txt. Ensure:
- Content-type is
text/plainortext/markdown - File is accessible without authentication
- File is NOT listed in robots.txt Disallow rules
- File is served over HTTPS
For static sites, most frameworks support this by placing the file in the public/
or static/ directory. For dynamic sites, add a route that serves the file.
Step 4 - Add llms-full.txt (optional)
If your site is documentation-heavy and you want to support AI tools that load content
rather than follow links, generate llms-full.txt by concatenating the Markdown source
of your most important pages. Many documentation frameworks (Docusaurus, GitBook, Mintlify)
have plugins or scripts that can generate this automatically.
Full example implementations
Developer documentation site (e.g., API product)
# Acme API Documentation
> RESTful payment processing API for e-commerce and SaaS platforms.
Acme's API enables developers to accept payments, manage subscriptions, and
handle payouts. This documentation covers authentication, all API endpoints,
error handling, and SDK usage.
## Getting Started
- [Quickstart](https://docs.acme.com/quickstart): Process your first payment
in under 5 minutes. Covers API key setup, a test charge, and response handling.
- [Core Concepts](https://docs.acme.com/concepts): Mental model for Acme's data
objects (charges, customers, subscriptions, payment methods) and how they relate.
## Authentication
- [API Keys](https://docs.acme.com/auth/api-keys): Creating and rotating API keys,
test vs live mode, key scoping for least-privilege access.
- [OAuth 2.0](https://docs.acme.com/auth/oauth): Implementing OAuth for third-party
integrations. Authorization code flow, refresh tokens, and scopes reference.
- [Webhook Signatures](https://docs.acme.com/auth/webhooks): Verifying webhook
payloads using HMAC-SHA256. Required for production webhook security.
## API Reference
- [Charges](https://docs.acme.com/api/charges): Create, capture, and refund charges.
Full parameter reference and response schema.
- [Customers](https://docs.acme.com/api/customers): Customer object CRUD operations,
payment method attachment, and customer portal configuration.
- [Subscriptions](https://docs.acme.com/api/subscriptions): Subscription lifecycle,
proration, trial periods, and cancellation handling.
- [Webhooks](https://docs.acme.com/api/webhooks): Event types reference, payload
schemas, retry behavior, and testing with the CLI.
## SDKs
- [Node.js SDK](https://docs.acme.com/sdk/node): Installation, initialization,
and complete method reference for the official Node.js library.
- [Python SDK](https://docs.acme.com/sdk/python): Installation and method reference
for the official Python library.
## About
- [Company](https://acme.com/about): Acme Inc. is a YC-backed payments infrastructure
company founded in 2021, processing $2B+ annually.
- [Status](https://status.acme.com): Real-time API uptime and incident history.Content/Marketing site
# Acme Marketing Blog
> Data-driven marketing strategies and case studies for B2B SaaS companies.
## Popular Guides
- [Email Deliverability Guide](https://acme.com/guides/email-deliverability): Complete
guide to improving inbox placement rates, covering SPF/DKIM/DMARC setup,
list hygiene, and warm-up strategies.
- [SaaS Pricing Playbook](https://acme.com/guides/saas-pricing): Framework for
value-based pricing, packaging strategies, and pricing page optimization with
A/B test data from 50+ SaaS companies.
## Research & Data
- [2024 B2B Marketing Report](https://acme.com/research/2024-b2b-marketing): Annual
survey of 500 B2B marketing leaders on budget allocation, channel performance,
and AI tool adoption.
## About
- [About Acme](https://acme.com/about): Marketing analytics platform helping
B2B SaaS companies attribute revenue to content and campaigns.Adoption status and AI engine support
As of early 2025:
- Anthropic (Claude): Has stated support for the
llms.txtconcept and crawls these files. Claude's web browsing uses llms.txt as a navigation signal. - OpenAI (ChatGPT): No official statement, but GPTBot crawls and likely processes the files as structured text.
- Perplexity: PerplexityBot crawls llms.txt files and the team has indicated awareness of the spec.
- Google: No official statement as of early 2025. Googlebot indexes the file as a regular crawled page, but specific llms.txt-aware behavior in AI Overviews has not been confirmed.
- Developer tools (GitHub Copilot, Cursor, Codeium): Several AI coding tools have implemented llms.txt support for loading documentation context.
Spec home: The canonical spec and a registry of sites with llms.txt are maintained
at https://llmstxt.org (community site, not a formal standards body).
Frequently Asked Questions
What is geo-optimization?
Use this skill when optimizing for AI-powered search engines and generative search results - Google AI Overviews, ChatGPT Search (SearchGPT), Perplexity, Microsoft Copilot Search, and other LLM-powered answer engines. Covers Generative Engine Optimization (GEO), citation signals for AI search, entity authority, LLMs.txt specification, and LLM-friendliness patterns based on Princeton GEO research. Triggers on visibility in AI search, getting cited by LLMs, or adapting SEO for the AI search era.
How do I install geo-optimization?
Run npx skills add AbsolutelySkilled/AbsolutelySkilled --skill geo-optimization in your terminal. The skill will be immediately available in your AI coding agent.
What AI agents support geo-optimization?
geo-optimization works with claude-code, gemini-cli, openai-codex, mcp. Install it once and use it across any supported AI coding agent.