Model Selection Guide¶
Storywright uses different AI models for different tasks. Choosing the right model for each role is the single biggest lever you have over quality, speed, and cost.
The Three Model Roles¶
Every story in Storywright uses three model roles. You can set each one independently — in Settings → Models for global defaults, or per-story in the story workspace.
Writing Model¶
Used for scene generation, revision, and rewriting. This is where quality matters most — the writing model produces the actual prose your readers will see.
What to look for: creativity, voice consistency, ability to follow complex instructions, strong prose quality.
Planning Model¶
Used for scene planning, plan refinement, and story metadata derivation. Needs good reasoning ability to break down story arcs into scenes, balance pacing, and respect your story structure template.
What to look for: reasoning ability, structured output, ability to follow multi-step instructions. Thinking/reasoning models work well here.
Extraction Model¶
Used for summarization, memory extraction, continuity updates, and thread extraction. These are structured tasks — the model follows a schema and pulls facts from text. This model runs 3–4 times per scene (once for each post-processing step), so cost and speed matter.
What to look for: fast, cheap, reliable instruction-following. Creativity is irrelevant. Avoid reasoning/thinking models here — they're slow and expensive for tasks that don't benefit from deep reasoning.
Embedding Model¶
Used for semantic memory recall — converting text into vectors so memories from earlier scenes can be recalled by meaning, not just keywords.
The embedding model is auto-selected based on your provider and doesn't appear in the model picker. If you use on-device embedding (Settings → Memory), the app uses a built-in MiniLM model with no API calls required.
Recommended Models by Provider¶
NanoGPT / Open Weights¶
NanoGPT gives you access to many open-weight and commercial models under one API key. Here are the best picks for each role:
| Role | Recommended Models |
|---|---|
| Writing | Euryale 70B (Sao10K), Magnum 72B (anthracite-org), Anubis 70B, Mistral Small Creative |
| Planning | GLM-5.1 Thinking, Gemini 2.5 Flash, DeepSeek R1 |
| Extraction | Gemini Flash, Llama 3.2 8B, any small fast model |
OpenAI¶
| Role | Recommended Models |
|---|---|
| Writing | GPT-4.1, GPT-4o |
| Planning | o4-mini, o3 |
| Extraction | GPT-4.1-nano, GPT-4.1-mini |
Anthropic (via NanoGPT or direct)¶
| Role | Recommended Models |
|---|---|
| Writing | Claude Sonnet 4, Claude Sonnet 4.5 |
| Planning | Claude Opus 4 |
| Extraction | Claude Haiku 3.5 |
Google Gemini¶
| Role | Recommended Models |
|---|---|
| Writing | Gemini 2.5 Pro |
| Planning | Gemini 2.5 Pro |
| Extraction | Gemini 2.0 Flash |
Local Models (Ollama / LM Studio)¶
| Role | Recommended Models |
|---|---|
| Writing | Llama 3.3 70B, Mistral Large, Qwen 2.5 72B — larger models produce better fiction |
| Planning | DeepSeek R1, Qwen 2.5 72B — reasoning-capable models |
| Extraction | Llama 3.2 8B, Phi-3 Mini, Gemma 2 9B — small and fast |
Note: Local models need sufficient VRAM. A 70B model typically requires 40+ GB of VRAM (or RAM with CPU inference, which is much slower). If you have limited hardware, use a smaller writing model and pair it with an 8B extraction model.
Budget Tiers¶
Free — Local Models¶
Run everything locally with Ollama or LM Studio. No API costs.
- Best for: privacy, experimentation, unlimited generation
- Trade-off: quality depends on your hardware. 8B models are fast but produce noticeably lower-quality fiction than cloud models. 70B+ models are competitive with cloud but require serious hardware.
- Tip: Use a large model for writing and a small model for extraction. The extraction model runs frequently but doesn't need creativity.
Budget — $0.50–$2.00 per story¶
Use affordable API models: Gemini Flash, GPT-4.1-mini, or NanoGPT's cheaper open-weight models.
- Best for: daily writing, high-volume generation, drafting
- Tip: Set extraction to the cheapest model available (GPT-4.1-nano, Gemini Flash). It runs 3–4x per scene and doesn't need quality — just reliability.
Premium — $2.00–$5.00 per story¶
Use top-tier models for writing: GPT-4.1, Claude Sonnet 4, Gemini 2.5 Pro.
- Best for: polished output, final drafts, maximum quality
- Tip: Mix tiers. Use a premium writing model, a mid-tier planning model, and a budget extraction model. This gets you top prose quality without premium prices on every API call.
Auto-Select¶
Storywright can automatically pick the best model for each role based on what's available from your configured providers.
How it works: When you click Auto-select in the model picker (or when models are first loaded), the app scans your available model list and picks the best match for each role using a preference chain — checking for known high-quality models first, then falling back to the best available option.
Models marked with ✦ Recommended in the model picker are ones Storywright considers particularly well-suited for that role based on the model's known characteristics.
When to use auto-select: - First-time setup — let the app pick sensible defaults - After adding a new provider — auto-select will consider the new models - When you're not sure which model to use
When to pick manually: - You know exactly which model you want - You want to experiment with a specific model - You're optimizing cost by mixing budget and premium models
Tips¶
- The extraction model is your biggest cost lever. It runs 3–4 times per scene (summary, memory, continuity, threads). Switching from a premium model to a fast/cheap one here can cut your per-story cost in half with no noticeable quality loss.
- Thinking/reasoning models (o3, DeepSeek R1, models with "thinking" in the name) are great for planning but terrible for extraction — they're slow and expensive for tasks that don't benefit from deep reasoning. Storywright automatically avoids them for the extraction role during auto-select.
- Per-story overrides let you use a premium model for your main project and a budget model for experiments — without changing global settings.
- Test with a short story first. Generate a 3-scene story to evaluate a model before committing to a full-length project.
- Local + cloud hybrid: Use a local extraction model (free, fast) and a cloud writing model (best quality). Set up two providers — one local, one cloud — and assign each to its role.
Related Guides¶
- Settings — configuring providers and model roles
- Generation — how models are used during generation
- Context Systems — how memory and continuity work