Question 1

What is self-hosted AI?

Accepted Answer

Self-hosted AI is running AI models on infrastructure you control (your servers or a cloud GPU you rent) instead of calling a managed API like Claude or OpenAI. You use open models (Llama, Qwen, DeepSeek, Mistral) served by tools like Ollama or vLLM. You gain control and privacy; you take on the hardware, ops, and maintenance that the API provider otherwise handles.

Question 2

Is self-hosting AI cheaper than using an API?

Accepted Answer

Only at high, stable volume. APIs charge per token with no fixed cost, so they win for low or spiky usage. Self-hosting has fixed GPU and ops costs that pay off once your token volume is large and predictable enough to amortize them. For most early-stage teams, API calls are cheaper once you count engineering time. Do the math on your actual volume before assuming self-hosting saves money.

Question 3

When should a startup self-host AI?

Accepted Answer

When privacy or compliance makes it mandatory (sensitive or regulated data that cannot leave your environment), when your token volume is large and stable enough to beat API costs, or when you need control and freedom from vendor lock-in. If none of those are hard requirements, use cloud APIs and revisit self-hosting as you scale. Privacy is the most common legitimate trigger.

Question 4

What's the capability trade-off with self-hosted models?

Accepted Answer

Open models you self-host still trail the frontier closed models (Claude, GPT, Gemini) on the hardest reasoning, coding, and long-context tasks, though the gap narrows over time. For many routine tasks open models are more than good enough. The smart pattern is hybrid: self-host for sensitive or high-volume simple work, use cloud APIs for the hardest tasks where peak capability matters.

Question 5

What do I need to self-host AI?

Accepted Answer

Open model weights, a serving stack (Ollama for simple, vLLM for production throughput), GPU infrastructure sized to your models, and ops to run and monitor it. Optionally a vector database and agent framework if you're building RAG or agents on top. Start by testing open models locally before committing to GPU infrastructure, so you know the capability fits your use case.

Self-Hosted AI: When It's Worth It for Founders (and When It Isn't)

◢What is self-hosted AI?

◢Is it cheaper?

◢When a startup should self-host

◢The capability trade-off

◢What you need

What is your whole stack costing you?

§Sources

Frequently asked questions

Don't just read the playbook. Steal the whole wired stack.