video-analyzer
Use this skill when analyzing existing video files using FFmpeg and AI vision, extracting frames for design system generation, detecting scene boundaries, analyzing animation timing, extracting color palettes, or understanding audio-visual sync. Triggers on video analysis, frame extraction, scene detection, ffprobe, motion analysis, and AI vision analysis of video content.
video ffmpegvideo-analysisframe-extractionai-visionscene-detectiondesign-systemWhat is video-analyzer?
Use this skill when analyzing existing video files using FFmpeg and AI vision, extracting frames for design system generation, detecting scene boundaries, analyzing animation timing, extracting color palettes, or understanding audio-visual sync. Triggers on video analysis, frame extraction, scene detection, ffprobe, motion analysis, and AI vision analysis of video content.
video-analyzer
video-analyzer is a production-ready AI agent skill for claude-code, gemini-cli, openai-codex. Analyzing existing video files using FFmpeg and AI vision, extracting frames for design system generation, detecting scene boundaries, analyzing animation timing, extracting color palettes, or understanding audio-visual sync.
Quick Facts
| Field | Value |
|---|---|
| Category | video |
| Version | 0.1.0 |
| Platforms | claude-code, gemini-cli, openai-codex |
| License | MIT |
How to Install
- Make sure you have Node.js installed on your machine.
- Run the following command in your terminal:
npx skills add AbsolutelySkilled/AbsolutelySkilled --skill video-analyzer- The video-analyzer skill is now available in your AI coding agent (Claude Code, Gemini CLI, OpenAI Codex, etc.).
Overview
Video analysis is the practice of extracting structured information from video files - metadata, keyframes, scene boundaries, color palettes, motion data, and audio characteristics. A well-built video analysis pipeline combines FFmpeg for frame extraction and signal processing with AI vision models for semantic understanding of visual content. This skill covers the full workflow from raw video files to actionable data: using ffprobe for metadata inspection, FFmpeg filter graphs for frame extraction and scene detection, audio analysis for silence and volume detection, and AI vision for design system extraction and content understanding.
The two pillars of video analysis are FFmpeg (the Swiss Army knife of media processing) and AI vision models (for understanding what is in each frame). FFmpeg handles the mechanical work - splitting video into frames, detecting scene changes via pixel difference thresholds, extracting audio waveforms. AI vision handles the semantic work - identifying UI components, reading text, extracting color values, and understanding layout patterns.
Tags
ffmpeg video-analysis frame-extraction ai-vision scene-detection design-system
Platforms
- claude-code
- gemini-cli
- openai-codex
Related Skills
Pair video-analyzer with these complementary skills:
Frequently Asked Questions
What is video-analyzer?
Use this skill when analyzing existing video files using FFmpeg and AI vision, extracting frames for design system generation, detecting scene boundaries, analyzing animation timing, extracting color palettes, or understanding audio-visual sync. Triggers on video analysis, frame extraction, scene detection, ffprobe, motion analysis, and AI vision analysis of video content.
How do I install video-analyzer?
Run npx skills add AbsolutelySkilled/AbsolutelySkilled --skill video-analyzer in your terminal. The skill will be immediately available in your AI coding agent.
What AI agents support video-analyzer?
This skill works with claude-code, gemini-cli, openai-codex. Install it once and use it across any supported AI coding agent.
Maintainers
Generated from AbsolutelySkilled
SKILL.md
Video Analyzer
Video analysis is the practice of extracting structured information from video files - metadata, keyframes, scene boundaries, color palettes, motion data, and audio characteristics. A well-built video analysis pipeline combines FFmpeg for frame extraction and signal processing with AI vision models for semantic understanding of visual content. This skill covers the full workflow from raw video files to actionable data: using ffprobe for metadata inspection, FFmpeg filter graphs for frame extraction and scene detection, audio analysis for silence and volume detection, and AI vision for design system extraction and content understanding.
The two pillars of video analysis are FFmpeg (the Swiss Army knife of media processing) and AI vision models (for understanding what is in each frame). FFmpeg handles the mechanical work - splitting video into frames, detecting scene changes via pixel difference thresholds, extracting audio waveforms. AI vision handles the semantic work - identifying UI components, reading text, extracting color values, and understanding layout patterns.
When to use this skill
Trigger this skill when the user:
- Wants to extract frames from a video at regular intervals or scene boundaries
- Needs to analyze video metadata (resolution, duration, codecs, bitrate)
- Asks about scene detection or scene change timestamps
- Wants to extract a color palette or design system from video content
- Needs to analyze audio tracks (silence detection, volume levels, waveforms)
- Asks about motion analysis or animation timing from video
- Wants to use AI vision to understand video content frame by frame
- Needs to generate thumbnails or preview strips from video files
Do NOT trigger this skill for:
- Creating or editing videos from scratch - use remotion-video or video-creator
- Writing video scripts or storyboards - use video-scriptwriting
- Live video streaming or real-time video processing
- Video encoding/transcoding for distribution (that is a rendering task, not analysis)
Key principles
Extract then analyze - Always separate frame extraction (FFmpeg) from semantic analysis (AI vision). Trying to do both in one step leads to brittle pipelines. Extract frames to disk first, then analyze them.
Use ffprobe before ffmpeg - Before processing any video, inspect it with ffprobe to understand its properties. Blindly running FFmpeg commands on unknown formats leads to silent failures and corrupted output.
Scene detection over fixed intervals - When analyzing video content, extract frames at scene boundaries rather than fixed time intervals. Scene change frames capture the visual diversity of the video with far fewer frames than one-per-second extraction.
JSON output everywhere - Use ffprobe's JSON output format and structure your analysis results as JSON. This makes pipelines composable and results machine-readable.
Disk space awareness - Video frame extraction can generate thousands of large image files. Always estimate output size before extracting, use appropriate image formats (JPEG for analysis, PNG for pixel-perfect work), and clean up temporary frames after analysis.
Core concepts
FFmpeg pipeline architecture
FFmpeg processes video through a pipeline of demuxing, decoding, filtering, encoding, and muxing. For analysis, we primarily use the decode and filter stages:
Input file -> Demuxer -> Decoder -> Filter graph -> Output (frames/data)Key filter concepts for analysis:
selectfilter: choose which frames to output based on expressionsshowinfofilter: print frame metadata (timestamps, picture type, etc.)scenedetection: pixel-level difference score between consecutive framesfpsfilter: reduce frame rate to extract at regular intervals
Scene detection
Scene detection works by comparing consecutive frames using pixel difference.
FFmpeg's scene filter produces a score from 0.0 (identical) to 1.0
(completely different). A threshold of 0.3-0.4 catches major scene changes
while ignoring camera motion and lighting shifts.
| Threshold | Behavior |
|---|---|
| 0.1-0.2 | Very sensitive - catches pans, zooms, lighting changes |
| 0.3-0.4 | Balanced - catches cuts, transitions, major changes |
| 0.5-0.7 | Conservative - only hard cuts and dramatic scene changes |
| 0.8-1.0 | Too aggressive - misses most scene changes |
AI vision analysis workflow
The workflow for extracting structured data from video using AI vision:
- Probe - Get video metadata with ffprobe (duration, resolution, fps)
- Extract - Pull key frames at scene boundaries using FFmpeg
- Read - Load each frame image using the Read tool (supports images)
- Analyze - For each frame, identify colors, typography, layout, components
- Aggregate - Find consistent patterns across frames
- Output - Produce structured design system or content analysis
Common tasks
1. Install and verify FFmpeg
Check if FFmpeg is available and inspect its version and capabilities.
# Check FFmpeg installation
ffmpeg -version
# Check ffprobe installation
ffprobe -version
# Install on macOS
brew install ffmpeg
# Install on Ubuntu/Debian
sudo apt-get update && sudo apt-get install -y ffmpeg
# Verify supported formats
ffmpeg -formats 2>/dev/null | head -20
# Verify supported codecs
ffmpeg -codecs 2>/dev/null | grep -i h2642. Extract key frames at scene boundaries
Extract only the frames where significant visual changes occur. This is the most efficient way to sample video content.
# Extract frames at scene changes (threshold 0.3)
mkdir -p scenes
ffmpeg -i input.mp4 \
-vf "select='gt(scene,0.3)',showinfo" \
-vsync vfr \
scenes/scene_%04d.png \
2>&1 | grep showinfo
# Extract with timestamps logged to a file
ffmpeg -i input.mp4 \
-vf "select='gt(scene,0.3)',showinfo" \
-vsync vfr \
scenes/scene_%04d.png \
2>&1 | grep "pts_time" > scenes/timestamps.txt
# Extract scene frames as JPEG (smaller files, good for analysis)
mkdir -p scenes
ffmpeg -i input.mp4 \
-vf "select='gt(scene,0.3)'" \
-vsync vfr \
-q:v 2 \
scenes/scene_%04d.jpg3. Extract frames at regular intervals
When you need evenly spaced samples regardless of content changes.
# Extract one frame per second
mkdir -p frames
ffmpeg -i input.mp4 -vf "fps=1" frames/frame_%04d.png
# Extract one frame every 5 seconds
mkdir -p frames
ffmpeg -i input.mp4 -vf "fps=1/5" frames/frame_%04d.png
# Extract only I-frames (keyframes from the codec)
mkdir -p keyframes
ffmpeg -i input.mp4 \
-vf "select='eq(pict_type,I)'" \
-vsync vfr \
keyframes/kf_%04d.png
# Extract a single frame at a specific timestamp
ffmpeg -i input.mp4 -ss 00:01:30 -frames:v 1 thumbnail.png
# Extract first frame only
ffmpeg -i input.mp4 -frames:v 1 first_frame.png4. Analyze video metadata with ffprobe
Inspect video properties before processing. Always use JSON output for machine-readable results.
# Full metadata as JSON (streams and format)
ffprobe -v quiet \
-print_format json \
-show_format \
-show_streams \
input.mp4
# Get duration only
ffprobe -v error \
-show_entries format=duration \
-of default=noprint_wrappers=1:nokey=1 \
input.mp4
# Get resolution
ffprobe -v error \
-select_streams v:0 \
-show_entries stream=width,height \
-of csv=s=x:p=0 \
input.mp4
# Get frame rate
ffprobe -v error \
-select_streams v:0 \
-show_entries stream=r_frame_rate \
-of default=noprint_wrappers=1:nokey=1 \
input.mp4
# Get codec information
ffprobe -v error \
-select_streams v:0 \
-show_entries stream=codec_name,codec_long_name,profile \
-of json \
input.mp4
# Count total frames
ffprobe -v error \
-count_frames \
-select_streams v:0 \
-show_entries stream=nb_read_frames \
-of default=noprint_wrappers=1:nokey=1 \
input.mp45. Detect scenes and list timestamps
Get a list of scene change timestamps without extracting frames.
# List scene change timestamps
ffmpeg -i input.mp4 \
-vf "select='gt(scene,0.3)',showinfo" \
-f null - \
2>&1 | grep pts_time
# Extract scene scores for every frame (for analysis)
ffmpeg -i input.mp4 \
-vf "select='gte(scene,0)',metadata=print" \
-f null - \
2>&1 | grep "lavfi.scene_score"
# Count number of scene changes
ffmpeg -i input.mp4 \
-vf "select='gt(scene,0.3)',showinfo" \
-f null - \
2>&1 | grep -c "pts_time"6. Extract audio waveform and detect silence
Analyze the audio track for silence gaps, volume levels, and visual waveforms.
# Detect silence periods (useful for finding chapter breaks)
ffmpeg -i input.mp4 \
-af silencedetect=noise=-30dB:d=0.5 \
-f null - \
2>&1 | grep silence
# Generate audio waveform as image
ffmpeg -i input.mp4 \
-filter_complex "showwavespic=s=1920x200:colors=blue" \
-frames:v 1 \
waveform.png
# Analyze volume levels
ffmpeg -i input.mp4 \
-af volumedetect \
-f null - \
2>&1 | grep volume
# Extract audio spectrum visualization
ffmpeg -i input.mp4 \
-filter_complex "showspectrumpic=s=1920x512:color=intensity" \
-frames:v 1 \
spectrum.png7. AI vision analysis workflow
Extract frames then analyze them with Claude's vision capability to extract structured information from video content.
# Step 1: Probe the video
ffprobe -v quiet -print_format json -show_format -show_streams input.mp4
# Step 2: Extract scene frames
mkdir -p analysis_frames
ffmpeg -i input.mp4 \
-vf "select='gt(scene,0.3)'" \
-vsync vfr \
-q:v 2 \
analysis_frames/frame_%04d.jpgAfter extracting frames, use the Read tool to load each image. The Read tool supports image files (PNG, JPG, etc.) and will present them visually. For each frame, analyze:
- Colors: Extract dominant hex color values, background colors, accent colors
- Typography: Identify font sizes, weights, line heights, heading hierarchy
- Layout: Detect grid patterns, flex layouts, spacing rhythms, margins
- Components: Identify buttons, cards, headers, navigation, forms
- Animation state: Note transitions, hover states, loading indicators
Aggregate findings across all frames to build a consistent design system.
8. Design system extraction from video
A complete workflow for extracting a design system from a product demo or UI walkthrough video.
# Step 1: Get video info
ffprobe -v quiet -print_format json -show_format input.mp4
# Step 2: Extract scene frames (captures each unique screen)
mkdir -p design_frames
ffmpeg -i input.mp4 \
-vf "select='gt(scene,0.4)'" \
-vsync vfr \
-q:v 1 \
design_frames/screen_%04d.png
# Step 3: Also extract at regular intervals for coverage
ffmpeg -i input.mp4 \
-vf "fps=1/3" \
-q:v 1 \
design_frames/interval_%04d.pngAfter frame extraction, analyze each frame with AI vision and compile:
{
"colors": {
"primary": "#2563EB",
"secondary": "#7C3AED",
"background": "#FFFFFF",
"surface": "#F3F4F6",
"text": "#111827",
"textSecondary": "#6B7280"
},
"typography": {
"headingFont": "Inter",
"bodyFont": "Inter",
"scale": ["12px", "14px", "16px", "20px", "24px", "32px", "48px"]
},
"spacing": {
"unit": "8px",
"scale": ["4px", "8px", "12px", "16px", "24px", "32px", "48px", "64px"]
},
"components": ["button", "card", "navbar", "sidebar", "input", "modal"]
}Anti-patterns / common mistakes
| Mistake | Why it is wrong | What to do instead |
|---|---|---|
| Extracting every frame from a video | Generates thousands of files, wastes disk and analysis time | Use scene detection or fixed intervals (1 fps or less) |
| Skipping ffprobe before processing | Unknown codecs or corrupt files cause silent FFmpeg failures | Always probe first to validate format and properties |
| Using PNG for bulk frame extraction | PNG files are 5-10x larger than JPEG with minimal quality gain for analysis | Use JPEG (-q:v 2) for analysis; PNG only for pixel-exact work |
| Setting scene threshold too low (0.1) | Catches camera motion, lighting shifts - produces too many frames | Start with 0.3-0.4 and adjust based on results |
Ignoring -vsync vfr with select filter |
Produces duplicate frames filling gaps in the timeline | Always use -vsync vfr when using the select filter |
| Analyzing frames without timestamps | Cannot correlate analysis results back to video timeline | Use showinfo filter to capture pts_time with each frame |
| Running AI vision on hundreds of frames | Exceeds context limits and wastes tokens | Limit to 10-20 representative frames per analysis pass |
| Hardcoding ffmpeg paths | Breaks across OS and install methods | Use ffmpeg and ffprobe directly, relying on PATH |
Gotchas
-vsync vfris required with select filters - Without-vsync vfr, FFmpeg fills "missing" frames between selected frames with duplicates to maintain a constant frame rate. This means extracting 5 scene-change frames might produce 500 output files, most of them duplicates. Always pairselectfilters with-vsync vfr.Scene detection threshold varies by content - A threshold of 0.3 works well for cuts in narrative video, but animated content or screen recordings may need 0.4-0.5 because gradual transitions produce lower scene scores. Always check the frame count after extraction and adjust the threshold.
ffprobe frame counting is slow - Using
-count_frameswith ffprobe decodes the entire video to count frames accurately. For long videos, this can take minutes. Usenb_framesfrom the stream metadata instead (less accurate but instant) or estimate from duration and frame rate.Audio silence detection parameters need tuning - The default
-30dBnoise threshold for silence detection may be too sensitive for videos with background music or ambient noise. Start with-30dBand increase to-20dBor-15dBif too many silence periods are detected. The duration parameterd=0.5means silence must last at least 0.5 seconds to register.Large frame extractions fill disk quickly - A 1080p PNG frame is roughly 2-5MB. Extracting one frame per second from a 60-minute video produces 3600 frames (7-18GB). Always estimate output size first:
duration_seconds * frames_per_second * avg_frame_size. Use JPEG for analysis workflows and clean up temporary frames promptly.
References
For detailed patterns on specific video analysis sub-domains, read the
relevant file from the references/ folder:
references/ffmpeg-recipes.md- advanced FFmpeg filter graphs for motion analysis, thumbnail generation, video comparison, and color extractionreferences/vision-analysis-prompts.md- structured prompts for AI vision analysis of video frames including design system extraction, content categorization, and accessibility auditing
Only load a references file if the current task requires it - they are long and will consume context.
References
ffmpeg-recipes.md
FFmpeg Recipes
Advanced FFmpeg recipes for video analysis tasks. Load this file only when the task requires patterns beyond basic frame extraction and scene detection.
Motion vectors visualization
Visualize motion vectors to understand camera movement and object motion.
# Render motion vectors as overlay on video
ffmpeg -flags2 +export_mvs -i input.mp4 \
-vf "codecview=mv=pf+bf+bb" \
motion_vectors.mp4
# Extract motion vectors to text (requires custom build with debug)
ffmpeg -flags2 +export_mvs -i input.mp4 \
-vf "codecview=mv=pf+bf+bb,showinfo" \
-f null - 2>&1 | grep "showinfo"Thumbnail sheet generation
Create a single image containing evenly spaced thumbnails from the video.
# Generate a 4x4 thumbnail grid
ffmpeg -i input.mp4 \
-vf "fps=1/30,scale=320:180,tile=4x4" \
-frames:v 1 \
thumbnail_sheet.png
# Generate thumbnail grid from first 2 minutes
ffmpeg -i input.mp4 -t 120 \
-vf "fps=1/15,scale=320:180,tile=4x4" \
-frames:v 1 \
preview_sheet.png
# Generate individual thumbnails at fixed size
ffmpeg -i input.mp4 \
-vf "fps=1/10,scale=160:90" \
-q:v 2 \
thumbs/thumb_%04d.jpgVideo comparison (side by side)
Compare two videos or a video against a reference frame.
# Side by side comparison of two videos
ffmpeg -i video_a.mp4 -i video_b.mp4 \
-filter_complex "[0:v]scale=960:540[left];[1:v]scale=960:540[right];[left][right]hstack" \
comparison.mp4
# Stack vertically for resolution comparison
ffmpeg -i original.mp4 -i compressed.mp4 \
-filter_complex "[0:v]scale=1920:540[top];[1:v]scale=1920:540[bottom];[top][bottom]vstack" \
vcompare.mp4
# Difference between two videos (highlights changes)
ffmpeg -i video_a.mp4 -i video_b.mp4 \
-filter_complex "[0:v][1:v]blend=all_mode=difference" \
diff.mp4Color analysis
Extract dominant colors and color distribution from video frames.
# Extract histogram data for a single frame
ffmpeg -i input.mp4 -ss 00:00:30 -frames:v 1 \
-vf "histogram=display_mode=overlay" \
histogram.png
# Generate color palette from a frame (creates a 16-color palette)
ffmpeg -i input.mp4 -ss 00:00:30 -frames:v 1 \
-vf "palettegen=max_colors=16:stats_mode=full" \
palette.png
# Generate palette from entire video (averages across all frames)
ffmpeg -i input.mp4 \
-vf "palettegen=max_colors=16:stats_mode=full" \
video_palette.png
# Extract average color per frame as text
ffmpeg -i input.mp4 \
-vf "scale=1:1,showinfo" \
-f null - 2>&1 | grep "color"Frame rate and speed analysis
Analyze and manipulate video timing.
# Show frame timestamps and durations
ffmpeg -i input.mp4 \
-vf "showinfo" \
-f null - 2>&1 | grep "showinfo" | head -50
# Detect variable frame rate issues
ffprobe -v error \
-select_streams v:0 \
-show_entries frame=pkt_pts_time,pkt_duration_time \
-of csv=p=0 \
input.mp4 | head -20
# Check for dropped frames by analyzing pts gaps
ffprobe -v error \
-select_streams v:0 \
-show_entries frame=pkt_pts_time \
-of csv=p=0 \
input.mp4 > frame_times.csvCrop detection
Automatically detect black bars or letterboxing.
# Detect crop values (runs for 60 seconds of video)
ffmpeg -i input.mp4 -t 60 \
-vf "cropdetect=24:16:0" \
-f null - 2>&1 | grep "cropdetect" | tail -5
# Apply detected crop
ffmpeg -i input.mp4 \
-vf "crop=1920:800:0:140" \
cropped.mp4Interlace detection
Detect if video content is interlaced.
# Detect interlacing
ffmpeg -i input.mp4 -t 30 \
-vf "idet" \
-f null - 2>&1 | grep "idet"
# Check with ffprobe
ffprobe -v error \
-select_streams v:0 \
-show_entries stream=field_order \
-of default=noprint_wrappers=1:nokey=1 \
input.mp4Multi-pass analysis pipeline
Complex analysis combining multiple FFmpeg passes.
# Pass 1: Extract scene timestamps
ffmpeg -i input.mp4 \
-vf "select='gt(scene,0.3)',showinfo" \
-f null - 2>&1 | grep pts_time | \
sed 's/.*pts_time:\([0-9.]*\).*/\1/' > scene_times.txt
# Pass 2: Extract frame at each scene timestamp
mkdir -p scene_frames
counter=1
while IFS= read -r timestamp; do
ffmpeg -ss "$timestamp" -i input.mp4 \
-frames:v 1 -q:v 2 \
"scene_frames/scene_$(printf '%04d' $counter).jpg" \
-y 2>/dev/null
counter=$((counter + 1))
done < scene_times.txt
# Pass 3: Generate thumbnail sheet from scene frames
ffmpeg -framerate 1 -pattern_type glob \
-i "scene_frames/scene_*.jpg" \
-vf "scale=320:180,tile=5x4" \
-frames:v 1 \
scene_overview.pngAudio-visual sync analysis
Detect synchronization issues between audio and video.
# Extract audio peaks (loud moments)
ffmpeg -i input.mp4 \
-af "astats=metadata=1:reset=1,ametadata=print:key=lavfi.astats.Overall.Peak_level" \
-f null - 2>&1 | grep "Peak_level" > audio_peaks.txt
# Extract visual change moments
ffmpeg -i input.mp4 \
-vf "select='gt(scene,0.3)',showinfo" \
-f null - 2>&1 | grep "pts_time" > visual_changes.txt
# Compare timestamps to detect sync drift
# (manual comparison of the two files above)Segment extraction
Extract specific segments for detailed analysis.
# Extract a time range (from 1:30 to 2:45)
ffmpeg -i input.mp4 -ss 00:01:30 -to 00:02:45 \
-c copy segment.mp4
# Extract first 30 seconds
ffmpeg -i input.mp4 -t 30 -c copy first_30s.mp4
# Split video into equal segments
ffmpeg -i input.mp4 \
-f segment -segment_time 60 \
-c copy \
segments/segment_%03d.mp4
# Extract segment around a specific timestamp (10 seconds before/after)
ffmpeg -i input.mp4 -ss 00:05:20 -t 20 \
-c copy context_clip.mp4Bitrate analysis
Analyze bitrate distribution across the video.
# Show bitrate per frame
ffprobe -v error \
-select_streams v:0 \
-show_entries frame=pkt_size,pkt_pts_time \
-of csv=p=0 \
input.mp4 > bitrate_data.csv
# Get average and max bitrate
ffprobe -v error \
-show_entries format=bit_rate \
-of default=noprint_wrappers=1:nokey=1 \
input.mp4
# Stream-specific bitrate
ffprobe -v error \
-select_streams v:0 \
-show_entries stream=bit_rate,max_bit_rate \
-of json \
input.mp4Useful filter combinations
Common filter chains for analysis workflows.
# Extract frames with burned-in timestamps
ffmpeg -i input.mp4 \
-vf "drawtext=text='%{pts\:hms}':fontsize=24:fontcolor=white:x=10:y=10,fps=1" \
frames_with_time/frame_%04d.png
# Extract frames with frame number overlay
ffmpeg -i input.mp4 \
-vf "drawtext=text='Frame %{n}':fontsize=24:fontcolor=white:x=10:y=10,fps=1" \
frames_numbered/frame_%04d.png
# Scale down before extraction (faster, smaller files)
ffmpeg -i input.mp4 \
-vf "scale=640:-1,fps=1" \
-q:v 3 \
small_frames/frame_%04d.jpg
# Extract frames with deinterlacing
ffmpeg -i input.mp4 \
-vf "yadif,fps=1" \
deinterlaced/frame_%04d.png vision-analysis-prompts.md
Vision Analysis Prompts
Structured prompts and workflows for AI vision analysis of video frames. Load this file only when the task involves using AI models to understand video frame content semantically.
Design system extraction
Single frame analysis prompt
When analyzing a single frame for design system elements, structure the analysis with these exact categories:
Analyze this UI screenshot and extract the following design system elements:
1. COLORS
- Primary brand color (hex)
- Secondary/accent colors (hex)
- Background colors (hex)
- Text colors (hex for headings, body, secondary text)
- Border/divider colors (hex)
- Status colors if visible (success, warning, error - hex)
2. TYPOGRAPHY
- Heading sizes (estimate px values for h1-h6 visible)
- Body text size (px)
- Font weight variations visible (regular, medium, semibold, bold)
- Line height (tight, normal, relaxed)
- Letter spacing if notable
3. SPACING
- Base spacing unit (estimate the smallest consistent gap in px)
- Section padding
- Card/container padding
- Gap between elements
4. LAYOUT
- Grid system (columns, gutter width)
- Max content width
- Sidebar width if present
- Navigation height
5. COMPONENTS
- List each distinct UI component visible
- For each: describe shape, colors, padding, border-radius
- Note hover/active states if visible
6. ICONS AND IMAGERY
- Icon style (outlined, filled, duotone)
- Icon size
- Image aspect ratios used
- Avatar sizes if present
Output as structured JSON.Multi-frame aggregation workflow
When analyzing multiple frames from the same video:
- Analyze each frame independently using the single frame prompt above
- Track consistency - note which values appear in 3+ frames
- Resolve conflicts by majority vote (most common value wins)
- Flag variations - different button styles may indicate primary vs secondary
- Build the final system using only values confirmed across multiple frames
Aggregation prompt
I have analyzed N frames from a product video. Here are the per-frame results:
[paste frame results]
Aggregate these into a single design system by:
1. Using the most frequently occurring value for each property
2. Flagging any property where frames disagree significantly
3. Identifying component variants (e.g., primary button vs ghost button)
4. Noting any responsive layout changes between frames
Output the final design system as a JSON object with these top-level keys:
colors, typography, spacing, layout, components, iconsContent categorization
Frame content classification prompt
Classify this video frame into one or more categories:
- UI/Product: Shows a software interface, app screen, or website
- Presentation: Shows a slide deck or presentation content
- Talking Head: Shows a person speaking to camera
- Screen Recording: Shows a computer screen with cursor activity
- Animation: Shows motion graphics or animated content
- Whiteboard: Shows diagrams, sketches, or whiteboard content
- Code: Shows code editor, terminal, or code snippets
- Data/Charts: Shows graphs, charts, dashboards, or data visualizations
- B-Roll: Shows supplementary footage (office, nature, etc.)
- Title Card: Shows a title, intro, or outro card
For each category detected, provide:
- Confidence: high/medium/low
- Key elements that led to this classification
- Suggested timestamp label (e.g., "Product Demo", "Feature Overview")Video chapter generation
I have extracted frames at these timestamps from a video:
[list of timestamp + frame description pairs]
Generate a chapter list for this video by:
1. Grouping consecutive frames with similar content types
2. Naming each chapter with a descriptive title (3-6 words)
3. Using the first timestamp of each group as the chapter start
4. Including a one-sentence summary per chapter
Output format:
- 00:00 - Chapter Title - Summary
- 01:23 - Chapter Title - SummaryAccessibility auditing
Contrast analysis prompt
Analyze this UI screenshot for accessibility issues:
1. COLOR CONTRAST
- Check text against background for WCAG AA compliance
- Flag any text that appears to have insufficient contrast
- Note the approximate contrast ratio (estimate)
2. TEXT SIZING
- Flag any text that appears smaller than 12px
- Note if body text appears smaller than 16px
- Check that heading hierarchy is clear
3. TOUCH TARGETS
- Flag any interactive elements that appear smaller than 44x44px
- Note spacing between clickable elements
4. VISUAL INDICATORS
- Check if information is conveyed by color alone
- Note presence of icons, underlines, or other non-color indicators
5. LAYOUT ISSUES
- Check for text that may be truncated or overflow
- Note any elements that appear to overlap
- Flag horizontal scrolling indicators
Rate overall accessibility: Good / Needs Improvement / Poor
Provide specific fix recommendations for each issue found.Animation and transition analysis
Transition detection prompt
I have extracted frames at 0.1-second intervals during a transition.
The frames show consecutive states of a UI change.
Analyze the transition and describe:
1. TRANSITION TYPE
- Fade, slide, scale, rotate, morph, or combination
- Direction (left-to-right, top-to-bottom, center-out, etc.)
2. TIMING
- Estimated total duration based on number of frames
- Easing curve (linear, ease-in, ease-out, ease-in-out, spring)
- Any delay before the transition starts
3. ELEMENTS INVOLVED
- Which elements are entering the view
- Which elements are leaving the view
- Which elements are changing state (color, size, position)
4. CSS EQUIVALENT
- Write the CSS transition or animation that would reproduce this
- Include keyframes if complex
- Specify timing function
Output the CSS implementation.Animation timing extraction
These frames show an animation sequence extracted at regular intervals.
For each frame pair (current vs previous), describe:
- Which properties changed (position, opacity, scale, color)
- Estimated magnitude of change
- Whether the change is accelerating or decelerating
Then synthesize into an animation specification:
- Total duration estimate
- Keyframe breakdown (0%, 25%, 50%, 75%, 100%)
- Recommended easing function
- CSS @keyframes implementationComponent inventory
UI component detection prompt
Scan this UI screenshot and create a component inventory.
For each unique component type found, document:
1. COMPONENT NAME (use common design system naming)
2. VARIANTS VISIBLE (e.g., primary button, secondary button)
3. VISUAL PROPERTIES
- Background color (hex)
- Border (width, color, radius)
- Padding (estimate in px)
- Shadow (if any)
4. CONTENT PATTERN
- Text content pattern (e.g., "short label", "sentence", "paragraph")
- Icon position (left, right, none)
- Image usage (avatar, thumbnail, hero)
5. STATE (default, hover, active, disabled, if distinguishable)
Group components by category:
- Navigation (navbar, sidebar, breadcrumbs, tabs)
- Content (cards, lists, tables, text blocks)
- Input (buttons, text fields, selects, checkboxes)
- Feedback (alerts, toasts, modals, tooltips)
- Layout (containers, grids, dividers, spacers)
Output as structured JSON array.Structured output templates
Design tokens JSON template
{
"colors": {
"primary": { "value": "#hex", "usage": "CTA buttons, links" },
"secondary": { "value": "#hex", "usage": "secondary actions" },
"neutral": {
"50": "#hex",
"100": "#hex",
"200": "#hex",
"300": "#hex",
"400": "#hex",
"500": "#hex",
"600": "#hex",
"700": "#hex",
"800": "#hex",
"900": "#hex"
},
"semantic": {
"success": "#hex",
"warning": "#hex",
"error": "#hex",
"info": "#hex"
}
},
"typography": {
"fontFamily": {
"heading": "font name",
"body": "font name",
"mono": "font name"
},
"fontSize": {
"xs": "12px",
"sm": "14px",
"base": "16px",
"lg": "18px",
"xl": "20px",
"2xl": "24px",
"3xl": "30px",
"4xl": "36px"
},
"fontWeight": {
"regular": 400,
"medium": 500,
"semibold": 600,
"bold": 700
}
},
"spacing": {
"unit": "4px",
"scale": ["4px", "8px", "12px", "16px", "24px", "32px", "48px", "64px"]
},
"borderRadius": {
"sm": "4px",
"md": "8px",
"lg": "12px",
"full": "9999px"
},
"shadows": {
"sm": "0 1px 2px rgba(0,0,0,0.05)",
"md": "0 4px 6px rgba(0,0,0,0.1)",
"lg": "0 10px 15px rgba(0,0,0,0.1)"
}
}Component inventory JSON template
{
"components": [
{
"name": "Button",
"variants": [
{
"name": "primary",
"background": "#hex",
"textColor": "#hex",
"borderRadius": "8px",
"padding": "12px 24px",
"fontSize": "14px",
"fontWeight": 600
}
],
"occurrences": 5,
"frames": [1, 3, 5, 8, 12]
}
]
}Best practices for vision analysis
Use the highest quality frames - Extract frames as PNG at original resolution for accurate color and typography analysis. JPEG compression shifts color values.
Analyze in batches of 5-10 - Processing too many frames at once exceeds context limits. Batch frames and aggregate results.
Provide reference context - Tell the vision model what the video is about (product demo, tutorial, etc.) for better component naming.
Validate hex values - Vision models estimate colors; verify extracted hex values by sampling actual pixel values from the PNG frames.
Cross-reference with code - If the analyzed product has a public repository, cross-reference extracted design tokens with actual CSS/theme files for ground truth.
Account for video compression - Video codecs compress colors and blur fine text. Extract frames from the highest quality source available and note that typography identification may be approximate.
Frequently Asked Questions
What is video-analyzer?
Use this skill when analyzing existing video files using FFmpeg and AI vision, extracting frames for design system generation, detecting scene boundaries, analyzing animation timing, extracting color palettes, or understanding audio-visual sync. Triggers on video analysis, frame extraction, scene detection, ffprobe, motion analysis, and AI vision analysis of video content.
How do I install video-analyzer?
Run npx skills add AbsolutelySkilled/AbsolutelySkilled --skill video-analyzer in your terminal. The skill will be immediately available in your AI coding agent.
What AI agents support video-analyzer?
video-analyzer works with claude-code, gemini-cli, openai-codex. Install it once and use it across any supported AI coding agent.