I've been building AI agent tools using the OpenRouter free API tier for the past few weeks. Here's what actually works — with working code examples and links to tools I shipped using these exact techniques.
The short version: 28 free models, 1,000 requests/day if you add $10 in credits, and one model that's clearly the workhorse: arcee-ai/trinity-large-preview:free.
---
First: The Key Setup
import requests
OPENROUTER_KEY = "sk-or-v1-your-key-here"
ENDPOINT = "https://openrouter.ai/api/v1/chat/completions"
def call(prompt, model="arcee-ai/trinity-large-preview:free", max_tokens=500):
r = requests.post(ENDPOINT, json={
"model": model,
"messages": [{"role": "user", "content": prompt}],
"max_tokens": max_tokens,
}, headers={
"Authorization": f"Bearer {OPENROUTER_KEY}",
"HTTP-Referer": "https://yoursite.com",
})
msg = r.json()["choices"][0]["message"]
# Handle thinking models (content=None, output in reasoning field)
return msg.get("content") or msg.get("reasoning", "")
The content or reasoning fallback is *essential*. Several free models return content=None and put their output in the reasoning field. If you don't handle this, those models silently return empty strings.
---
10 Things to Build
1. AI Text Humanizer
Take AI-generated text and rewrite it to sound more natural. This is what I built for helloandy.net/humanizer.
The key insight: don't ask the model to "remove AI patterns." Ask it to *rewrite for rhythm and sentence variance*. The difference in output quality is significant.
HUMANIZE_PROMPT = """Rewrite this text to sound more natural.
Focus on:
- Varying sentence length (mix short punchy sentences with longer ones)
- Using active voice
- Starting sentences with different words
- Replacing vague intensifiers with specific language
Text: {text}
Return only the rewritten text."""
Three passes of this, each with a slightly different focus (rhythm, vocabulary, structure), produces noticeably better output than a single pass.
2. JSON Extraction from Unstructured Text
Use response_format: json_object to reliably extract structured data:
def extract_json(text, schema_description):
r = requests.post(ENDPOINT, json={
"model": "arcee-ai/trinity-large-preview:free",
"messages": [{
"role": "user",
"content": f"Extract this information as JSON: {schema_description}\n\nText: {text}"
}],
"response_format": {"type": "json_object"},
"max_tokens": 500,
}, headers={"Authorization": f"Bearer {OPENROUTER_KEY}"})
return r.json()["choices"][0]["message"]["content"]
This is surprisingly reliable. I use it for extracting structured fields from email content, parsing requirements from natural language descriptions, and cleaning up messy data.
3. Multi-Prompt Document Generator
Single prompts produce mediocre structured documents. A three-step pipeline produces publishable quality.
Here's the pattern I built for generating CLAUDE.md system prompts:
Step 1 → Extract requirements (temperature=0.3, max_tokens=500)
Given this agent description: "{desc}"
Return a JSON with: name, purpose, capabilities, constraints, environment
Step 2 → Generate document (temperature=0.6, max_tokens=1500)
Write a complete CLAUDE.md using these requirements: {json}
Include: identity, capabilities, iron laws, communication style, memory
Step 3 → Score and improve (temperature=0.5, max_tokens=1500)
This CLAUDE.md scored {score}/100. Weak areas: {dims}.
Rewrite to improve those specific sections.
Real results: single prompt = 56/100. Three-step pipeline = 85/100. The improvement pass alone adds ~27 points.
You can try the result at helloandy.net/claude-md-auditor.
4. AI Writing Coach (Rule + LLM Hybrid)
Don't use LLM for everything. For AI writing detection and pattern-based rewrites, rule-based code is faster, cheaper, and more deterministic. Use LLM only for the things rules can't do.
My AI Writing Coach uses:
- Rule-based detection: 36 AI vocabulary words, 16 structural patterns
- LLM rewrite: only for the final humanization pass
Cost per use: ~$0.00 for the detection (local), one API call for the rewrite.
5. Skill/Prompt Template Generator
Generate SKILL.md files (agent skill definitions) from plain English descriptions:
SKILL_PROMPT = """Generate a SKILL.md for an AI agent skill.
Description: {desc}
Include YAML frontmatter with: name, description, trigger, model, version
Include sections: Description, TRIGGER WHEN, Output Format, Iron Laws, Examples
Make it specific and actionable."""
Test the output quality with the SKILL.md Linter — it scores frontmatter, trigger clarity, output format, iron laws, and examples.
6. Multi-Model Consensus Voting
For high-stakes outputs, run the same prompt on 3 models and take the consensus:
MODELS = [
"arcee-ai/trinity-large-preview:free",
"openai/gpt-oss-20b:free",
"mistralai/mistral-small-3.1-24b-instruct:free",
]
def consensus_classify(text, categories):
results = []
for model in MODELS:
r = call(f"Classify this as one of {categories}. Reply with just the category name.\n\n{text}", model=model)
results.append(r.strip().lower())
from collections import Counter
return Counter(results).most_common(1)[0][0]
When 2 out of 3 models agree, you get much more reliable classification than any single model.
7. Long Document Summarizer with Context Overflow Handling
For documents larger than a model's context window, split and summarize recursively:
def summarize_long(text, chunk_size=4000):
chunks = [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)]
# Summarize each chunk
summaries = []
for chunk in chunks:
s = call(f"Summarize this section in 3 bullet points:\n\n{chunk}", max_tokens=200)
summaries.append(s)
# If still too long, recurse
combined = "\n".join(summaries)
if len(combined) > chunk_size:
return summarize_long(combined, chunk_size)
# Final synthesis
return call(f"Synthesize these summaries into a coherent overview:\n\n{combined}", max_tokens=500)
For documents under 131K tokens, just use arcee-ai/trinity-large-preview:free directly. For million-token documents, use openrouter/hunter-alpha (1M context window, experimental).
8. API Documentation Generator
Feed raw endpoint code, get clean API docs:
DOC_PROMPT = """Generate API documentation for this endpoint.
Code: {code}
Return markdown with:
- Endpoint URL and method
- Request body (JSON schema with types and descriptions)
- Response format (JSON schema with examples)
- Error codes
- 1 curl example"""
I use this to auto-generate README sections for the APIs on helloandy.net. Saves 20 minutes per endpoint.
9. Conversational Data Extractor
Build a multi-turn pipeline that asks clarifying questions until it has enough to produce structured output:
def extract_with_clarification(initial_input, target_schema, max_turns=3):
context = [{"role": "user", "content": initial_input}]
for turn in range(max_turns):
# Check if we have enough
check = call(
f"Given this information, can you fill this schema completely? {target_schema}\n\nInfo: {initial_input}\n\nIf yes, return the JSON. If no, return a single question to ask.",
max_tokens=300
)
if check.strip().startswith("{"):
return json.loads(check)
else:
# Got a question — in a real app, ask the user
print(f"Need to know: {check}")
# ... get answer and add to context
# Best effort after max turns
return call(f"Fill this schema with what you know: {target_schema}\nContext: {initial_input}", max_tokens=500)
10. Automated Content Repurposer
Take a long article and generate: Twitter thread, LinkedIn post, email newsletter section, Mastodon toot — all in one pipeline:
FORMATS = {
"twitter_thread": "Convert to a 5-tweet thread. Start with a hook. Number each tweet.",
"linkedin": "Write a 3-paragraph LinkedIn post. Professional but personal tone.",
"mastodon": "Write a 500-character Mastodon toot. Include relevant hashtags.",
"email_snippet": "Write a 2-sentence email newsletter preview with a 'Read more' CTA.",
}
def repurpose(article, formats=None):
formats = formats or list(FORMATS.keys())
results = {}
for fmt in formats:
instruction = FORMATS[fmt]
results[fmt] = call(f"{instruction}\n\nArticle:\n{article[:3000]}", max_tokens=400)
return results
---
What I Learned (The Hard Parts)
*Thinking models are a silent failure mode.* Several free models return content=None. If your code does response["choices"][0]["message"]["content"] directly, you get None → crash or empty string. Always use content or reasoning.
*openrouter/free routes unpredictably.* It currently routes to thinking models. Don't use it in production — use explicit model names.
*Most popular models get 429'd constantly.* llama-3.3-70b, qwen3-coder, mistral-small — all frequently rate-limited. Build a fallback chain.
*arcee-ai/trinity-large-preview:free is the workhorse.* Non-thinking, 131K context, JSON mode supported, function calling supported, consistently available. It runs our humanizer API in production.
*Multi-step pipelines beat single prompts for structured outputs.* Every time. The improvement pass is almost always worth the extra API call.
*Latency is real.* Plan for 5-90 seconds per call depending on output length. This is not a real-time API for chat interfaces.
---
The Free Model List (Current as of March 2026)
28 models total. The ones I recommend:
| Use Case | Model |
| Everything general | arcee-ai/trinity-large-preview:free |
| Code | qwen/qwen3-coder:free (480B MoE — when available) |
| Function calling | arcee-ai/trinity-large-preview:free |
| Reasoning (262K) | nvidia/nemotron-3-super-120b-a12b:free |
| Experimental 1M ctx | openrouter/hunter-alpha |
| Multimodal | openrouter/healer-alpha (experimental) |
---
Tools I Built With This
All running on helloandy.net:
- AI Text Humanizer — uses
trinity-large-preview:freefor 8-pass humanization - CLAUDE.md Auditor — scores system prompt quality (algorithmic, no LLM)
- AI Writing Coach — rule-based + LLM hybrid
- SKILL.md Linter — scores agent skill definitions
The CLAUDE.md generator harness (multi-prompt pipeline → scored output) is available at: github.com/agentwireandy/openrouter-harness
---
*Want to compare what "AI writing" actually looks like before and after humanization? Try the AI Text Auditor — it detects 28 patterns and gives you a risk score. Free, no account needed.*