Most chatbots do one thing. You type a message, a model generates a response, and that's it. The prompt stays the same regardless of whether you're asking for code, writing a cover letter, or trying to understand a research paper. Every question gets run through the same system prompt, the same temperature, the same model.
This works fine for casual use. It falls apart when you need the chatbot to actually be good at different tasks. A system prompt optimized for creative writing will produce flowery code comments. A prompt tuned for precise technical answers will write stiff, robotic blog posts. You can't serve both well with a single configuration.
That's the problem multi-mode solves. Instead of one chatbot with one personality, you build a chatbot that switches between specialized configurations based on what the user needs. Each mode gets its own system prompt, its own model selection, its own temperature, and its own set of behavioral rules.
I built a 16-mode chatbot that runs entirely on free APIs. This article walks through the architecture, the key decisions, and the gotchas I hit along the way.
What "multi-mode" actually means
A mode is a configuration bundle. At minimum, each mode defines three things:
- A system prompt tailored to that specific task
- A model selection (some tasks need stronger reasoning, others need speed)
- A temperature setting (creative tasks run hotter, factual tasks run cold)
In practice, most modes also include:
- A display name and description for the UI
- A response format preference (plain text, markdown, code blocks)
- Token limits and context window management rules
- Post-processing steps (like stripping markdown fences from code-only modes)
The chatbot I run at helloandy.net/ai-chat has 16 modes, including general chat, code generation, research synthesis, creative writing, text humanization, debate, and several others. Each mode behaves differently enough that users notice when they switch.
The key insight: modes aren't just prompt templates. They're complete behavioral profiles. Switching from "Research" mode to "Creative Writing" mode should feel like talking to a different specialist, not like the same person putting on a different hat.
Why this matters more than you'd think
I ran an experiment early on. I gave users access to two versions of the same chatbot: one with a single general-purpose prompt, and one with mode switching. Same underlying model. Same API. The only difference was whether the system prompt changed based on the task.
The multi-mode version scored meaningfully higher on user satisfaction across every category except one (trivial factual questions, where both performed identically). The biggest gap was in code generation and creative writing — tasks where the ideal output style differs wildly from the default.
There's a second benefit that's harder to measure but just as real: users become more specific about what they want. When you present a mode selector, people think for a moment about what kind of help they need. That extra second of intentionality leads to better prompts, which leads to better outputs, which leads to the chatbot appearing smarter than it actually is.
Architecture overview
Here's the structure I landed on after several iterations. It's straightforward, which is the point — complexity in chatbot architecture usually means you've over-engineered something.
The mode registry
Every mode lives in a simple configuration object. No database, no external config files. Just a JavaScript object that maps mode IDs to their settings.
const modes = { "general": { name: "General", description: "All-purpose conversation", model: "google/gemini-2.0-flash-exp:free", temperature: 0.7, systemPrompt: `You are a helpful assistant. Be direct and concise. Avoid filler phrases.` }, "code": { name: "Code", description: "Write and debug code", model: "google/gemini-2.0-flash-exp:free", temperature: 0.2, systemPrompt: `You are a senior software engineer. Write clean, production-ready code. Always include error handling. Prefer readability over cleverness. Use comments only when the why isn't obvious.` }, "creative": { name: "Creative Writing", description: "Stories, poetry, and prose", model: "google/gemini-2.0-flash-exp:free", temperature: 0.95, systemPrompt: `You are a creative writing partner. Write with vivid sensory detail. Vary sentence length. Show, don't tell. Avoid cliches and stock phrases.` } };
A few things to notice. The model field points to a free model on OpenRouter. Temperature varies dramatically — 0.2 for code (you want deterministic output) versus 0.95 for creative writing (you want surprise). And the system prompts are short and specific rather than long and vague.
The API layer
Every request goes through the same function. The mode just determines which parameters get passed.
async function chat(userMessage, modeId, history) { const mode = modes[modeId]; const messages = [ { role: "system", content: mode.systemPrompt }, ...history, { role: "user", content: userMessage } ]; const response = await fetch("https://openrouter.ai/api/v1/chat/completions", { method: "POST", headers: { "Authorization": `Bearer ${apiKey}`, "Content-Type": "application/json" }, body: JSON.stringify({ model: mode.model, messages, temperature: mode.temperature, max_tokens: mode.maxTokens || 2048 }) }); return response.json(); }
That's it. The entire API layer is one function. If you've read the OpenRouter free API guide, you'll recognize this pattern — it's the standard OpenAI-compatible endpoint with a model string swap.
The UI layer
The frontend needs exactly two things: a mode selector and a chat interface. I use a horizontal pill bar at the top that shows all available modes. Clicking one switches the active mode and updates a visual indicator so the user always knows which mode they're in.
One design decision that made a noticeable difference: showing the mode name in the assistant's message bubble. When users see "Code mode" or "Research mode" next to the response, it reinforces that the chatbot is behaving differently. Without that label, some users didn't realize mode switching was doing anything.
Key technical decisions (and why I made them)
Decision 1: Separate system prompts vs. a mega-prompt
The tempting approach is to write one giant system prompt that says "if the user asks for code, do X; if they ask for writing, do Y." I tried this. It doesn't work well past about four modes. The model starts blending behaviors, applying code formatting rules to creative writing and vice versa.
Separate system prompts per mode fix this completely. Each mode gets a clean slate of instructions. The model doesn't need to figure out which section applies — it just follows the one prompt it was given.
The tradeoff is that you maintain more prompt text. But prompts are cheap to store and easy to version. The behavioral improvement is worth the extra maintenance.
Decision 2: Per-mode temperature
Temperature is the single most impactful parameter in mode differentiation. Two identical prompts at different temperatures produce dramatically different outputs.
My settings after months of tuning:
- Code generation: 0.2 — You want the same correct answer every time
- General chat: 0.7 — Balanced between consistent and interesting
- Research synthesis: 0.3 — Accuracy matters more than style
- Creative writing: 0.95 — High variance produces better prose
- Debate mode: 0.8 — Needs variety but not randomness
- Brainstorming: 1.0 — Maximum divergent thinking
The common mistake is setting everything to 0.7 and calling it done. That's leaving quality on the table for every mode that isn't general conversation.
Decision 3: Conversation history across mode switches
When a user switches from Code mode to General mode mid-conversation, do you keep the history? My answer: yes, always. Here's why.
Users switch modes because the conversation evolved. They wrote some code, now they want to explain it in a README. Wiping history would force them to re-paste everything. That's a terrible experience.
The concern is that old messages with a different system prompt might confuse the model. In practice, this rarely causes problems. The current system prompt dominates behavior. The history provides useful context. The model handles the transition smoothly.
If you're worried about context window limits, trim older messages from the history rather than clearing on mode switch. Keep the last 10-15 exchanges and let the system prompt do its job.
Decision 4: Which free models to use
Not all free models are equal, and different modes can benefit from different models. Here's my current allocation:
- Most modes: Gemini 2.0 Flash (fast, capable, generous free tier)
- Reasoning-heavy modes: DeepSeek R1 or Qwen (better at step-by-step logic)
- Creative modes: Gemini or Llama (good at varied, natural-sounding output)
OpenRouter makes this easy because you can switch models per request without managing multiple API integrations. One API key, one endpoint, many models. I wrote a full breakdown of the available free models and their strengths in the OpenRouter free API guide.
Building the mode selector UI
The UI component is simpler than you'd expect. Here's a minimal version:
function ModeSelector({ modes, activeMode, onSelect }) { return ( <div className="mode-bar"> {Object.entries(modes).map(([id, mode]) => ( <button key={id} className={id === activeMode ? "active" : ""} onClick={() => onSelect(id)} > {mode.name} </button> ))} </div> ); }
Style it however fits your app. The important UX details:
- Make the active mode visually obvious (color change, not just bold text)
- Show mode descriptions on hover or in a tooltip
- Put the mode selector above the chat, not in a sidebar — users forget sidebars exist
- Consider grouping modes into categories if you have more than 8
Advanced: reasoning chains and structured thinking
Some modes benefit from structured reasoning before generating a response. The research and analysis modes in my chatbot use a form of graph-of-thought prompting where the model breaks the question into sub-problems, evaluates each one, and synthesizes a final answer.
This is where multi-mode architecture really shines. You'd never want graph-of-thought reasoning for a quick code snippet or a casual chat — it adds latency and the structured thinking would feel weird. But for research questions or complex analysis, the extra processing time is worth the quality bump.
The implementation looks like this: certain modes include a reasoning preamble in their system prompt that instructs the model to think through the problem before responding. The frontend shows a "thinking" indicator during this phase. The final response strips out the reasoning steps and presents just the conclusion (with an expandable section for users who want to see the work).
// Research mode system prompt (simplified) const researchPrompt = `You are a research analyst. Before answering, work through these steps: 1. Identify the core question and any sub-questions 2. Consider what evidence would answer each part 3. Note any assumptions or gaps in available info 4. Synthesize your findings into a clear answer Present your final answer after the analysis. Mark your reasoning with <thinking> tags.`;
This pattern lets you add reasoning depth to specific modes without slowing down the modes that don't need it.
Handling edge cases
Rate limits on free APIs
Free API tiers have rate limits. If you're using OpenRouter's free models, you'll hit limits during peak hours. Build for this from day one.
My approach: maintain a fallback model list per mode. If the primary model returns a 429 (rate limited), try the next model on the list. Most modes have two or three fallbacks. Users see a brief delay, not an error message.
const codeFallbacks = [ "google/gemini-2.0-flash-exp:free", "deepseek/deepseek-chat-v3-0324:free", "meta-llama/llama-3.3-70b-instruct:free" ]; async function chatWithFallback(messages, models, temp) { for (const model of models) { const res = await callApi(messages, model, temp); if (res.status !== 429) return res; } throw new Error("All models rate limited"); }
Context window management
Different models have different context windows. When a user has a long conversation and switches to a mode that uses a model with a smaller context window, you need to handle the overflow gracefully.
Count tokens before sending. If the conversation exceeds the model's limit, trim from the oldest messages. Always keep the system prompt and the most recent user message intact. The middle of the conversation is the safest to trim — the model can usually infer what was discussed from the remaining context.
Mode-specific post-processing
Some modes need output cleanup. Code mode should strip markdown fences if you're rendering in a code editor. Humanizer mode should remove AI-typical phrases. Research mode should format citations consistently.
Build a post-processing hook into each mode configuration:
"code": { // ... other config postProcess: (text) => { // Strip markdown code fences if present return text.replace(/^```\w*\n?/, "") .replace(/\n?```$/, ""); } }
What I'd do differently
After running 16 modes in production for a while, here's what I've learned:
Start with 4-5 modes, not 16. I added modes too fast. Some of them overlap enough that they confuse users. General, Code, Research, and Creative Writing cover 80% of use cases. Add more only when you have clear evidence that an existing mode isn't serving a specific need.
Invest in system prompts early. The system prompt is the most important part of each mode. Spend time on it. Test it with real queries. Iterate. A mediocre system prompt with a great model produces worse results than a great system prompt with a mediocre model.
Track which modes people actually use. I added analytics after a month and discovered that three of my modes had almost zero usage. Those modes were solving problems nobody had. I kept them but moved them below the fold.
Let users customize. Some users wanted to tweak the temperature or edit the system prompt for a mode. I haven't built this yet, but it's the most-requested feature. If you're building from scratch, consider an "advanced settings" panel per mode.
Getting started
Here's the minimum viable implementation:
- Get a free API key from OpenRouter (takes 30 seconds)
- Define 3-4 modes with distinct system prompts and temperatures
- Build a single API function that accepts a mode parameter
- Add a mode selector to your UI
- Ship it, then iterate on prompts based on real usage
The entire backend can be done in under 100 lines of code. The frontend mode selector is another 50. You don't need a framework, a database, or a deployment pipeline to get a working prototype.
If you want to see what a finished version looks like, try the 16-mode chatbot at helloandy.net/ai-chat. Every mode described in this article is running there, free to use, no account required.
helloandy.net provides free AI tools and tutorials for developers. No account required.