close
Skip to content

JunSeo99/claude-skill-codex-imagegen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

claude-skill-codex-imagegen

Use OpenAI's gpt-image-2 — OpenAI's most capable image generation model — from inside Claude Code.

🌐 English · 한국어 · 日本語 · 简体中文


📦 What

A Claude Code skill that calls Codex CLI's $imagegen (gpt-image-2) on plain natural-language asks — "generate a hero image", "make a favicon", "insert images that fit the site" — and lands the result where you actually wanted it. No new slash command to learn. Claude calls it as part of whatever it's already doing.

💡 Why

Claude Code has no built-in image model. So most vibe-coded sites either ship without imagery or paste in stock that doesn't match. And generated images from a year ago screamed "AI" louder than the layout did, so people stopped trying. gpt-image-2 finally clears that bar — near-perfect text rendering, consistent lighting, real subject framing — which makes the image layer the cheapest way out of the "every AI site looks the same" trap. This skill makes that an in-session step, optionally guided by a DESIGN.md you keep at your project root.

🚀 Quickstart

git clone https://github.com/JunSeo99/claude-skill-codex-imagegen \
  ~/.claude/skills/codex-imagegen

Restart Claude Code, then ask in natural language:

"Generate a 1600×900 hero image for this landing page, save to assets/hero.png."

Want consistency across an entire site? Drop a DESIGN.md at the project root, then:

"Using DESIGN.md as the style reference, insert images that fit the site."

That's it. Full details below.


Claude Code does not ship with an image-generation model of its own. This skill closes that gap by teaching Claude Code to call gpt-image-2 through the OpenAI Codex CLI's built-in $imagegen feature — so you can generate icons, banners, OG cards, illustrations, infographics, and photo edits without ever leaving your Claude Code session.

The skill bundles a verified prompting playbook, a CLI reference, a security note, and a sample asset produced during validation.

A single white origami crane on a soft warm-gray surface — sample output generated by gpt-image-2 via this skill

Sample 1600×900 hero image generated by gpt-image-2 via this skill.

Why this skill exists

Claude Code can already drive the Codex CLI, but $imagegen has rough edges that Claude misses on its own:

  • gpt-image-2 ignores the exact output size you request (e.g. 256×256 → 1254×1254)
  • Transparent PNGs are not supported by gpt-image-2 (only gpt-image-1.5 supports them, per the OpenAI guide)
  • The raw PNG lands at ~/.codex/generated_images/<session-uuid>/ig_*.png — not where you asked
  • "Stunning, cinematic, 8K" keyword prompts produce visibly worse output than the five-part structured prompts the official OpenAI Cookbook recommends
  • The naive non-interactive recipe requires --dangerously-bypass-approvals-and-sandbox, which hands the Codex sub-agent broad shell power — not a safe default

This skill bakes those facts in, defaults to a safer split workflow (Codex generates only; the host does the file moves), and only opts into the bypass mode when explicitly requested.

What people use it for

The tool is general — anything that needs a PNG/JPEG/WebP written to disk fits. In practice the workflows that come up most often:

  • Hero images and background photography for landing pages and marketing sites
  • OG cards and social previews generated per page
  • Favicons and app icons at the sizes you actually need
  • Blog post illustrations that match the post's tone instead of leaning on stock libraries
  • Brand asset drafts — logos, banners, badges — to iterate before committing to a designer
  • Infographic placeholders and diagrams with consistent visual language
  • Photo edits — change-X-keep-Y patterns on an existing image

The workflow it was originally built around is solo developers shipping sites without a designer — where image quality and stylistic consistency are the main signal separating a vibe-coded site from a polished product. With a DESIGN.md at the project root (see Usage), Claude Code can generate a coherent image set across the whole site in one pass. But none of that requires you to be using it for a site; the skill is just as happy producing a single OG card or a batch of game-asset placeholders.

⚠️ Security note: this skill defines two run modes. The default is safe; the opt-in "automated" mode uses --dangerously-bypass-approvals-and-sandbox. Read SECURITY.md before using the automated mode in a directory whose prompts or contents you do not control.

Requirements

  • macOS or Linux
  • Claude Code — this skill is a filesystem skill loaded from ~/.claude/skills/, which is a Claude Code feature (Claude.ai web uses a different skill upload mechanism)
  • Codex CLI v0.130 or newer (npm i -g @openai/codex)
  • A logged-in Codex session (codex login) — uses your ChatGPT/Codex subscription
  • Optional: OPENAI_API_KEY in your environment to bill batch jobs against the API instead

Verified against codex-cli 0.130.0 on macOS (Darwin 25.4.0). sips ships with macOS; on Linux the skill falls back to ImageMagick convert.

Installation

Option A — clone into the skills directory (recommended for daily use)

git clone https://github.com/JunSeo99/claude-skill-codex-imagegen.git
mkdir -p ~/.claude/skills
ln -s "$(pwd)/claude-skill-codex-imagegen/skill" ~/.claude/skills/codex-imagegen

Symlinking from skill/ lets you git pull to update without re-copying files.

Option B — install the prebuilt .skill bundle

A pre-packaged distributable lives in dist/codex-imagegen.skill (it's a zip with a .skill extension).

git clone https://github.com/JunSeo99/claude-skill-codex-imagegen.git
mkdir -p ~/.claude/skills
unzip claude-skill-codex-imagegen/dist/codex-imagegen.skill -d ~/.claude/skills/

Option C — copy the folder

git clone https://github.com/JunSeo99/claude-skill-codex-imagegen.git
mkdir -p ~/.claude/skills
cp -r claude-skill-codex-imagegen/skill ~/.claude/skills/codex-imagegen

Restart Claude Code (or start a new session) so the skill is discovered.

Usage

The skill activates on phrases such as:

  • "generate an image", "make an icon", "create a banner", "OG image"
  • "hero illustration", "make a favicon", "brand mark", "product shot"
  • "imagegen", "GPT Image 2", "codex image"

Multilingual triggers are supported via the skill's description field — localized prompts (Korean, Japanese, etc.) work without configuration.

Basic usage — single asset

Any request that produces a visual file saved to disk:

You: Make a 512×512 hero icon for my landing page — a single seedling growing from a flat horizon, line-art only, no text.

Claude: (invokes the skill, composes a five-part prompt, runs codex exec in Mode A — safe, parses the absolute path from stdout, cps and sips-resizes it to ./assets/hero-icon.png, then opens the file with Read to verify it matches your intent)

By default Claude runs Codex without --dangerously-bypass-approvals-and-sandbox and does the cp/sips step itself, in its own approved-tool context. The Codex sub-agent never gets carte-blanche shell access during normal use.

If you want a single-step automated flow (Mode B) — e.g. for batching — you can opt in:

You: Generate these 12 favicons in automated mode.

Claude: (after confirming, runs Codex with --dangerously-bypass-approvals-and-sandbox so the sub-agent handles cp/sips itself)

Read SECURITY.md before opting in.

For complex prompts (text in the image, photo edits, brand assets), Claude reads references/prompting-guide.md before generating to apply the structured prompt template and avoid known pitfalls.

For full-site image sets — pair with a DESIGN.md

For projects that need a coherent visual language across multiple slots — hero, OG card, empty states, illustrations, favicons — drop a DESIGN.md at the project root with your palette, typography, and illustration style. Then ask Claude Code:

You: Using DESIGN.md as the style reference, insert images that fit the site.

Claude reads DESIGN.md, scans the codebase for slots that need imagery, writes prompts that incorporate the palette and tone, calls this skill for each, and inserts the resulting paths into the right <img> tags. The hero image, the empty-state illustration, and the OG card all end up looking like they belong to the same product.

A minimal DESIGN.md that works well:

# Design

## Concept
Calm, considered, modern.

## Palette
- Surface (main):  #F4F1ED  — warm off-white
- Surface (cards): #FFFFFF
- Text:            #1A1A1A
- Accent / CTA:    #C46A4E  — soft terracotta, used sparingly

## Typography
- Inter, system-ui sans-serif

## Illustration style
- Single subject, plenty of whitespace, no busy backgrounds
- Soft natural light from upper left
- No text inside images unless explicitly asked
- Avoid stock-photo vibes and over-saturated colors

The qualitative Illustration style block carries most of the consistency work. Palette obviously matters too, but it's the descriptive instructions ("hand-folded paper feel", "no busy backgrounds", "warm tones") that keep each image from looking like it came from a different stock-image library.

Before / After — what the image layer changes

To make the difference concrete, here's the same coffee-shop landing page built two ways. Identical component code in both — same Next.js 15, same Tailwind, same shadcn-style markup, same content, same navigation. The only thing that varies is the image layer.

Without images With images
Coffee landing page built with shadcn defaults, Lucide icons, and a purple-blue gradient — no real images, classic AI-default visual stack Same coffee landing page built with a DESIGN.md and product photography generated by this skill via gpt-image-2 — calm blue-gray palette, detailed coffee-bag product shots, editorial brewing photos
0 images. Lucide Coffee over a purple-blue gradient hero, Bean icons inside product cards, Sparkles over a gradient story, Droplet/Flame brewing icons. The textbook AI default stack. 8 images generated by this skill with a DESIGN.md at the project root. Hero photography, five custom coffee-bag product shots (origin, roast, tasting notes, roast/best-by dates, brew recipe all on-label), a roastery story background, three brewing macro shots.

Both pages were produced in the same session. The right-hand one took roughly one extra command — "Using DESIGN.md as the style reference, insert images that fit the site." The demo project itself is intentionally kept outside this repo to keep the skill bundle small.

What's in the skill

skill/
├── SKILL.md                  Workflow (Mode A & B), recipes, failure modes, triggers
├── references/
│   ├── prompting-guide.md    5-part structure, text rendering, edit pattern, anti-patterns
│   └── cli-reference.md      codex exec flags, output paths, sips/convert post-processing, costs
└── assets/
    └── hero.png              Sample 1600×900 hero image generated by gpt-image-2

The five-part prompt structure (drawn from fal.ai and the OpenAI Cookbook) is:

  1. Scene/context — environment, time of day, mood
  2. Subject — main figure or object
  3. Details — style, medium, lighting, lens, color, texture
  4. Use case — drives output size and aspect
  5. Constraints — what to preserve, what to forbid

Cost

  • ChatGPT/Codex subscription: 1 image turn ≈ 3–5 text turns of usage limit
  • API key mode (OPENAI_API_KEY set): priced per image, typically $0.04 – $0.35
    • Image output: $30.00 / 1M output tokens
    • Image input: $8.00 / 1M input tokens ($2.00 / 1M cached input)
    • Plus text-input tokens for your prompt (see OpenAI pricing for the current text rate on gpt-image-2)

For batch work (10+ images), the API key mode is generally cheaper than the subscription.

Known limitations of gpt-image-2

Limitation Workaround the skill applies
Output size doesn't match request Always adds "at exactly WxH pixels"; host runs sips -z H W (macOS) or convert -resize WxH! (Linux)
No transparent PNG Documents the limitation; suggests gpt-image-1.5 via Image API or post-processing
Long multi-line text passages, brand names, and very small text in dense layouts still wobble (short labels and CJK render near-perfectly) EXACT TEXT marker + double quotes for literal strings; letter-by-letter spelling for brand names; HTML/CSS overlay for paragraph-length text
Latency up to 2 min on complex prompts Bash timeout set to 300000 ms
Imprecise element placement in complex layouts Falls back to simplification or SVG-then-rasterize suggestion

Compatibility

Component Tested
codex-cli 0.130.0
OS macOS (Darwin 25.4.0); Linux untested but expected to work with ImageMagick fallback
Claude Code App / CLI (filesystem skills)

Output-path layout under ~/.codex/generated_images/ and $imagegen invocation semantics are not part of the Codex CLI's public contract. If a future codex-cli release changes them, please open an issue with the new behavior.

Contributing

Issues and PRs welcome. Useful directions:

  • Linux-side post-processing parity (ImageMagick verified end-to-end)
  • Additional recipes (favicons, app store screenshots, social card pipelines)
  • Improved non-Latin text rendering tips (CJK, Arabic, Devanagari, etc.)
  • Migration notes when newer Codex CLI versions change $imagegen behavior

When changing the skill body, run the validators from anthropics/skills:

python3 path/to/skill-creator/scripts/quick_validate.py skill/
python3 path/to/skill-creator/scripts/package_skill.py skill/ dist/

Security

See SECURITY.md for the trust boundary and the threat model around --dangerously-bypass-approvals-and-sandbox.

Changelog

See CHANGELOG.md.

Acknowledgements

This project is independent and not affiliated with, endorsed by, or sponsored by Anthropic or OpenAI. "Claude", "Claude Code", "OpenAI", "Codex", and "GPT" are trademarks of their respective owners.

License

MIT © 2026 JunSeo99

About

A Claude Code skill that generates and edits images via Codex CLI's $imagegen (gpt-image-2).

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors