Building an AI chatbot used to require a budget. You needed API credits, a paid hosting plan, and often a subscription to some chatbot platform that charged per message. In 2026, none of that is true anymore. The ecosystem has matured to the point where you can build a fully functional, production-quality AI chatbot without spending a single dollar.
This is not a toy project. The chatbot I run at helloandy.net/ai-chat handles thousands of conversations using entirely free infrastructure. It supports 16 different conversation modes, streams responses in real time, and runs 24/7 on a setup that costs nothing in API fees. This article walks you through how to build something similar from scratch.
Whether you want a customer support bot for a side project, a personal AI assistant, or just a portfolio piece that proves you can build real things, this guide covers everything: which free APIs actually work, where to host without paying, how to structure the code, and how to avoid the pitfalls that trip up most first-time chatbot builders.
The free AI API landscape in 2026
The single biggest change since 2024 is how many high-quality language models you can access without paying. The competitive pressure between Google, Meta, DeepSeek, and others has created a situation where genuinely capable models are available at zero cost. Here are the options worth considering.
OpenRouter: the best starting point
If you only read one section of this article, make it this one. OpenRouter is an API aggregator that gives you access to dozens of models through a single endpoint. Several of these models are completely free, with no credit card required.
The free models available on OpenRouter as of early 2026 include:
- Google Gemini 2.0 Flash (Experimental): Fast, capable, and generous rate limits. This is the workhorse model for most free chatbot projects. Good at general conversation, code generation, and analysis.
- DeepSeek V3 and R1: Excellent for reasoning-heavy tasks. DeepSeek R1 in particular excels at step-by-step problem solving, math, and complex analysis. The free tier is surprisingly generous.
- Meta Llama 3.3 70B Instruct: A strong open-source model with good creative writing capabilities and solid general performance. Available free through several providers on OpenRouter.
- Qwen 2.5 72B: Alibaba's flagship open model. Particularly strong at multilingual tasks and coding.
The key advantage of OpenRouter is the unified API. You write one integration and can switch between any of these models by changing a single string. If one model is rate-limited or down, you fall back to another with zero code changes. I wrote a detailed breakdown of every free model and its strengths in the OpenRouter free API guide.
HuggingFace Inference API
HuggingFace offers a free Inference API that lets you run open-source models without managing any infrastructure. The free tier gives you access to thousands of models, though with rate limits that make it better suited for development and low-traffic projects than high-volume production use.
The process is straightforward: pick a model from the HuggingFace Hub, point your code at the Inference API endpoint, and start sending requests. Models like Mistral, Zephyr, and various Llama fine-tunes are all available.
The main limitation is speed. Free-tier inference can be slow, especially for larger models. If a model is not already loaded in memory (what HuggingFace calls a "cold start"), you might wait 30-60 seconds for the first response. Subsequent requests are faster, but the cold start problem makes HuggingFace less ideal for chatbots that need to feel responsive.
My recommendation: use HuggingFace for experimentation and model comparison, but run your production chatbot through OpenRouter. The free models there are faster and more reliable for real-time conversation.
Running models locally (the hidden free option)
If you have a decent GPU (8GB VRAM or more), you can run smaller language models entirely on your own hardware. Tools like Ollama make this surprisingly painless. You install it, download a model, and you have a local API endpoint that works just like a cloud API but runs on your machine.
Good local models for chatbot use include:
- Llama 3.2 3B: Runs well on modest hardware, surprisingly capable for its size
- Phi-3 Mini: Microsoft's small model that punches above its weight
- Gemma 2 9B: Google's compact model with strong instruction following
The advantage is zero rate limits and complete privacy. The disadvantage is that your chatbot only works when your computer is on, and smaller models are noticeably less capable than the cloud models available through OpenRouter. For most people building their first free chatbot, the cloud API route is simpler and produces better results.
Free hosting options
You have the AI part sorted. Now you need somewhere to run the code that connects your frontend to the API. Here are the options that actually work for chatbot hosting without paying anything.
Static hosting + serverless functions
The simplest architecture for a free chatbot: a static HTML/CSS/JS frontend served from a free static host, with a serverless function handling the API calls.
Free static hosting options:
- GitHub Pages: Free for public repos. Serves static files with a decent CDN. Perfect for the chatbot frontend.
- Cloudflare Pages: Generous free tier with unlimited bandwidth. Includes serverless functions (Workers) so you can handle the backend too.
- Vercel: Free hobby tier. Supports serverless functions natively. The most developer-friendly option if you are using React or Next.js.
For serverless functions (handling the API proxy):
- Cloudflare Workers: 100,000 free requests per day. More than enough for a chatbot. Fast cold starts.
- Vercel Serverless Functions: Free tier includes 100GB-hours of execution time per month.
- AWS Lambda: One million free requests per month. More setup involved, but extremely reliable.
Why you need a backend at all
You might wonder: can you just call the OpenRouter API directly from your frontend JavaScript? Technically yes. Practically, it is a terrible idea. Your API key would be visible in the browser's developer tools. Anyone could extract it and use your key for their own projects, potentially getting you rate-limited or banned.
The serverless function acts as a proxy. Your frontend sends the user's message to your function. The function adds the API key and forwards the request to OpenRouter. The API key never leaves the server. This is a standard pattern and takes about 20 lines of code to implement.
// Cloudflare Worker example (complete API proxy) export default { async fetch(request, env) { const { messages, model, temperature } = await request.json(); const response = await fetch( "https://openrouter.ai/api/v1/chat/completions", { method: "POST", headers: { "Authorization": "Bearer " + env.OPENROUTER_KEY, "Content-Type": "application/json", }, body: JSON.stringify({ model: model || "google/gemini-2.0-flash-exp:free", messages, temperature: temperature || 0.7, stream: true, }), } ); return new Response(response.body, { headers: { "Content-Type": "text/event-stream", "Access-Control-Allow-Origin": "*", }, }); }, };
That is a complete, working API proxy. It receives messages from your frontend, forwards them to OpenRouter with your API key (stored as an environment variable), and streams the response back. Deploy this to Cloudflare Workers and your backend is done.
VPS hosting (free tier or cheap)
If you want more control, several VPS providers offer free tiers or always-free instances:
- Oracle Cloud: Genuinely free forever tier with an ARM instance (4 CPU cores, 24GB RAM). This is overkill for a chatbot proxy but useful if you want to run additional services.
- Google Cloud: Free e2-micro instance. Tiny but sufficient for a chatbot API proxy.
A VPS gives you full control. You can run a Python Flask server, a Node.js Express app, or anything else. The tradeoff is that you manage the server yourself — updates, security, uptime. For a free chatbot project, serverless functions are usually the better choice unless you specifically want the learning experience of managing a server.
Architecture: putting it all together
Here is the architecture I recommend for a free AI chatbot in 2026. It is simple, reliable, and every component is free.
+-------------------+ +--------------------+ +-------------------+ | Static Host |---->| Serverless Proxy |---->| OpenRouter | | (GitHub Pages) | | (Cloudflare Work) | | (Free APIs) | | |<----| |<----| | | HTML/CSS/JS | | Adds API key | | Gemini, Llama | | Chat UI | | Handles CORS | | DeepSeek, etc | +-------------------+ +--------------------+ +-------------------+
Three components. No database. No authentication system. No framework. The frontend is vanilla HTML, CSS, and JavaScript. The proxy is a single serverless function. The AI comes from OpenRouter's free models.
The frontend: building the chat UI
You do not need React. You do not need Vue. You do not need a CSS framework. A chat interface is a text input, a send button, and a scrollable message list. Here is a minimal but functional implementation:
<!-- Minimal chat UI --> <div id="chat-container"> <div id="messages"></div> <form id="chat-form"> <input type="text" id="user-input" placeholder="Type a message..." /> <button type="submit">Send</button> </form> </div> <script> const PROXY_URL = "https://your-worker.workers.dev"; const history = []; document.getElementById("chat-form") .addEventListener("submit", async (e) => { e.preventDefault(); const input = document.getElementById("user-input"); const userMsg = input.value.trim(); if (!userMsg) return; appendMessage("user", userMsg); history.push({ role: "user", content: userMsg }); input.value = ""; const res = await fetch(PROXY_URL, { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ messages: [ { role: "system", content: "You are a helpful assistant." }, ...history ] }) }); const data = await res.json(); const reply = data.choices[0].message.content; appendMessage("assistant", reply); history.push({ role: "assistant", content: reply }); }); function appendMessage(role, text) { const div = document.createElement("div"); div.className = "message " + role; div.textContent = text; document.getElementById("messages") .appendChild(div); } </script>
That is about 40 lines and it gives you a working chatbot. The user types a message, it goes to your proxy, the proxy calls OpenRouter, and the response appears in the chat. Add CSS to make it look nice, and you have something you can show people.
Adding streaming for a better experience
The code above waits for the entire response before showing anything. Real chatbots stream the response word by word, which feels much faster even when the total time is the same. OpenRouter supports streaming out of the box.
To implement streaming, you change two things: the proxy needs to forward the stream (the Cloudflare Worker example above already does this), and the frontend needs to read the stream incrementally using the ReadableStream API.
// Frontend streaming implementation const response = await fetch(PROXY_URL, { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ messages, stream: true }) }); const reader = response.body .pipeThrough(new TextDecoderStream()) .getReader(); let fullText = ""; while (true) { const { done, value } = await reader.read(); if (done) break; // Parse SSE chunks const lines = value.split("\n"); for (const line of lines) { if (!line.startsWith("data: ")) continue; const json = line.slice(6); if (json === "[DONE]") break; const chunk = JSON.parse(json); const token = chunk.choices[0]?.delta?.content; if (token) { fullText += token; updateMessage(fullText); } } }
Streaming makes a dramatic difference in perceived quality. Users see the response forming in real time, which makes the chatbot feel alive rather than stuck thinking behind a loading spinner.
Making it actually good: system prompts and modes
A free chatbot with a generic system prompt is just a worse version of ChatGPT. To make yours worth using, you need to specialize it. The easiest way is through carefully crafted system prompts.
The system prompt is the invisible instruction set that tells the model how to behave. Users never see it, but it determines everything about the chatbot's personality, capabilities, and output style. Here is what a good system prompt looks like versus a bad one:
// Bad: vague, generic "You are a helpful AI assistant. Answer questions accurately." // Good: specific, behavioral "You are a senior software engineer who helps with code. Rules: - Write production-ready code, not toy examples - Always include error handling - Use TypeScript unless asked otherwise - If a question is ambiguous, ask one clarifying question before writing code - Never apologize or use filler phrases - Keep explanations under 3 sentences unless asked for more"
The specific prompt produces noticeably better outputs because it constrains the model's behavior in useful ways. Every rule eliminates a category of bad responses. For a deep dive on writing effective prompts, see the prompt engineering best practices guide.
Adding multiple modes
The most impactful upgrade you can make to a free chatbot is adding mode switching. Instead of one chatbot that tries to be okay at everything, build a chatbot that switches between specialized configurations. I wrote an entire article on how to build a multi-mode AI chatbot with the full architecture, but here is the core concept:
const modes = { general: { model: "gemini-2.0-flash:free", temp: 0.7, prompt: "You are a helpful assistant..." }, code: { model: "deepseek-v3:free", temp: 0.2, prompt: "You are a senior engineer..." }, creative: { model: "llama-3.3-70b:free", temp: 0.95, prompt: "You are a creative writer..." }, research: { model: "deepseek-r1:free", temp: 0.3, prompt: "You are a research analyst..." } };
Each mode gets its own model, temperature, and system prompt. Code mode uses a low temperature for deterministic output and a model strong at reasoning. Creative mode cranks up the temperature for more varied prose. The user picks a mode from a selector at the top of the chat, and the backend swaps the configuration accordingly.
This single change can make a free chatbot feel premium. Users notice when the chatbot actually adapts to different tasks instead of giving the same generic tone regardless of what they ask.
Handling the limitations of free APIs
Free APIs are free for a reason. They have limitations. Here is how to handle each one so your users never notice.
Rate limits
Every free API has rate limits. OpenRouter's free models typically allow 10-20 requests per minute. During peak hours, you might hit these limits even with moderate traffic.
The solution is fallback chains. Define a priority list of models for each mode. If the primary model returns a 429 (rate limited), try the next one. Most users will never know the switch happened because the fallback models are nearly as good.
async function chatWithFallback(messages, mode) { const models = [ "google/gemini-2.0-flash-exp:free", "deepseek/deepseek-chat-v3-0324:free", "meta-llama/llama-3.3-70b-instruct:free" ]; for (const model of models) { try { const res = await callOpenRouter( messages, model, mode.temperature ); if (res.ok) return res; if (res.status === 429) continue; return res; } catch (e) { continue; } } throw new Error("All models unavailable"); }
Response quality variation
Free models are good but not perfect. You will occasionally get responses that are too long, poorly formatted, or slightly off-topic. There are three ways to mitigate this:
- Better system prompts: The more specific your instructions, the more consistent the output. Tell the model exactly what you want and what you do not want.
- Post-processing: Clean up common issues automatically. Strip unwanted markdown, truncate overly long responses, remove AI-isms like "Certainly!" and "I'd be happy to help!"
- Model selection: Different models have different strengths. Gemini tends to be concise and well-formatted. DeepSeek is better at reasoning. Llama produces more natural-sounding text. Match the model to the task.
Context window limits
As conversations get longer, you will hit the model's context window limit. Most free models support 8K-32K tokens of context. A long conversation can easily exceed this.
The simplest fix: maintain a sliding window of conversation history. Keep the system prompt and the last N message pairs. Discard older messages. Most users will not notice because the recent context is what matters for coherent responses.
function trimHistory(history, maxPairs) { maxPairs = maxPairs || 15; if (history.length <= maxPairs * 2) return history; // Keep the most recent messages return history.slice(-(maxPairs * 2)); }
A working example: helloandy.net AI Chat
Everything described in this article is running in production at helloandy.net/ai-chat. It is a free, no-login-required chatbot with 16 specialized modes covering general conversation, code generation, creative writing, research, debate, text humanization, and more.
The chatbot runs entirely on free OpenRouter APIs with automatic model fallback. Responses stream in real time. Conversation history persists in your browser. There are no usage limits beyond the API rate limits, which the fallback system handles transparently.
You can try it right now to see what a free chatbot feels like when it is built with attention to the details covered in this guide. It is not a demo or a prototype — it is the real thing, serving real users, at zero cost.
Understanding AI agents: the next step
Once you have a working chatbot, the natural next question is: can it do things on its own? Can it search the web, call other APIs, or take actions without being explicitly told every step? This is where chatbots evolve into AI agents.
An agent is a chatbot with tools. Instead of just generating text, it can decide to search for information, run code, read files, or interact with external services. The same free APIs that power your chatbot can power an agent — you just add a tool-calling layer on top.
This is a more advanced topic and not necessary for your first chatbot, but it is worth understanding where the technology is heading. The line between chatbot and agent is blurring fast, and the architecture you build today can evolve into something more capable tomorrow.
Common mistakes and how to avoid them
After building several chatbots and helping others build theirs, here are the mistakes I see most often:
Mistake 1: Exposing the API key in frontend code. This is the most common and most dangerous mistake. Never put your API key in JavaScript that runs in the browser. Always use a serverless function or backend proxy. It takes 10 minutes to set up and prevents your key from being stolen.
Mistake 2: Not handling errors gracefully. Free APIs will occasionally fail. Rate limits, server errors, and network issues are inevitable. Your chatbot needs to handle every one of these with a user-friendly message, not a blank screen or a cryptic error dump.
Mistake 3: Using a framework you don't need. You do not need Next.js, Nuxt, SvelteKit, or any other framework to build a chatbot. Vanilla HTML, CSS, and JavaScript work perfectly. Frameworks add complexity, build steps, and deployment considerations. Start simple. Add complexity only when you have a specific reason.
Mistake 4: Ignoring mobile. Over half your users will be on phones. If your chat interface does not work well on a 375px-wide screen, you have lost half your audience. Test on mobile early and often. The key issues: text input that does not get hidden by the keyboard, messages that wrap properly, and a send button that is easy to tap.
Mistake 5: Making the chatbot try to do everything. A chatbot that is mediocre at 20 things is worse than a chatbot that is excellent at three things. Pick a focus. Build for that. Expand later based on what users actually ask for, not what you think would be cool.
Step-by-step deployment checklist
Here is the complete checklist to go from zero to a deployed free chatbot:
- Get your API key. Sign up at OpenRouter. Copy your key. This takes one minute.
- Write the proxy. Create a Cloudflare Worker (or Vercel function) that forwards requests to OpenRouter with your API key. Deploy it. Test it with curl.
- Build the frontend. An HTML file with a chat form, a message display area, and JavaScript that calls your proxy. Start with the minimal code in this article and expand from there.
- Add streaming. Swap from waiting for full responses to reading the stream incrementally. This is the single biggest UX improvement you can make.
- Write a good system prompt. Spend real time on this. Test it with diverse queries. Iterate until the chatbot consistently produces the kind of responses you want.
- Handle errors. Add try/catch blocks everywhere. Show friendly messages when things fail. Add the fallback model chain.
- Deploy the frontend. Push to GitHub Pages, Cloudflare Pages, or Vercel. Point your domain at it if you have one.
- Test on mobile. Open it on your phone. Fix whatever is broken. There will be something.
- Ship it. Do not wait until it is perfect. Ship it when it works, then iterate based on real usage.
What comes next
A basic chatbot is a starting point. Once it is live and working, here are the upgrades worth considering:
- Multiple conversation modes — covered in detail in the multi-mode chatbot guide
- Conversation persistence — store chat history in localStorage so it survives page refreshes
- Markdown rendering — use a library like marked.js to render formatted responses with code highlighting
- Analytics — add basic tracking to see which features people use and which they ignore
- Custom themes — let users switch between light and dark mode
- Export functionality — let users download their conversation as a text file
Every one of these can be implemented without spending money. The free tier of modern web infrastructure is genuinely sufficient for running a chatbot that serves real users.
helloandy.net provides free AI tools and tutorials for developers. No account required.