Self-hostable · Open source · Apache 2.0

A full-featured AI platform
you actually own.

Phlox is a self-hostable AI platform — an agentic tool-using harness, document RAG, code execution, an OpenAI-compatible gateway, and per-user cost accounting — running over any model provider: AWS Bedrock or any OpenAI-compatible endpoint, including fully local models.

Runs over
  • AWS Bedrock
  • OpenAI
  • Ollama
  • OpenRouter
  • vLLM
  • LiteLLM
  • LM Studio
localhost:5173
The Phlox chat interface showing a streaming conversation, tool calls, and an artifact panel
~40built-in agent tools
100%self-hostable & offline-capable
8built-in themes
6+model providers, one config

Bring your own model. Or run it all locally.

Named provider profiles cover AWS Bedrock and any OpenAI-compatible endpoint — OpenAI, LiteLLM, or a local runtime. Point Phlox at Ollama, LM Studio, or vLLM and the whole stack — chat and RAG embeddings — runs offline with no cloud API key. Switch profiles live, with a built-in connection tester.

  • Define as many provider profiles as you like, switch between them instantly
  • Local embeddings (e.g. nomic-embed-text) keep RAG fully offline
  • Edit profiles, pricing, and limits live — no server restart required
config.yml
default_profile: local-ollama
profiles:
  local-ollama:
    type: openai
    label: "Ollama (local)"
    endpoint: http://localhost:11434/v1
    api_key: ollama       # ignored by Ollama
    model: qwen3.6:35b
    supports_tools: true
Everything in one app

A complete assistant, not just a chat box

Phlox bundles the pieces you'd otherwise stitch together yourself — each one self-hosted and under your control.

💬

Streaming chat

Conversation history with rename, delete, search & export. Edit/regenerate messages, markdown with highlighted, copyable code, plus Mermaid diagrams and LaTeX math.

🤖

Agentic harness

The model uses tools in a loop — filesystem, shell, Python/Node execution, document search, plus planning, sub-agents, memory, and checkpoints — all in a sandboxed workspace.

🤝

Human-in-the-loop

Pause on sensitive tools, approve or deny, then resume. The run state is persisted, so approvals survive disconnects.

🧰

Code execution & artifacts

Run code with captured output and inline artifacts. A Workspace Files panel lets you browse and download everything the agent created.

📚

Documents & RAG

Upload PDF, DOCX, TXT, MD, or code. Hybrid dense + sparse search over Qdrant with reranking and citations, scoped globally or per conversation. Works offline.

🌐

Opt-in web search

A per-prompt composer toggle exposes web_search (zero-config ddgs or SearXNG) so the agent can discover current sources before fetching pages.

🧠

Cross-conversation memory

Durable facts are saved and semantically recalled across chats, so the assistant remembers you from one conversation to the next.

🖼️

Multimodal

Attach images to messages for vision models, persisted and replayed into the provider as image content parts.

🔌

MCP integration

Connect Model Context Protocol servers from the UI; their tools join the model's toolset automatically, no code required.

🚪

OpenAI-compatible gateway

Mint per-user API keys and call Phlox from any OpenAI SDK via /v1/chat/completions — with the same per-user cost accounting.

💵

Usage & cost accounting

Per-message token and cost in the UI, plus an admin chargeback view by month × user × department × model, with CSV export for finance.

🎨

Theming

Phlox Dark by default, with Light, Fred Hutch, Hutch Night, Sandstone and more — instant switching via a CSS-variable token system.

The agentic core

A real agent, not "chat that calls tools"

Each turn, the model works in a loop — calling tools, planning, and recovering — inside a per-conversation sandboxed workspace you can inspect, snapshot, and roll back.

01 Tool loop

Filesystem (read_file, write_file, edit_file, glob, grep), run_shell, execute_python / execute_node, and search_documents — one unified tool surface the model drives until the task is done.

02 Planning & sub-agents

update_todos keeps a visible plan; spawn_subagent runs a nested, ephemeral agent with a scoped toolset in the same workspace and returns a report.

03 Memory & checkpoints

save_memory persists durable facts across chats. Every workspace is a git repo that auto-snapshots after mutating tools, with one-click restore.

04 Approvals & permissions

Every tool has an auto / ask / deny policy. The loop pauses on ask, persists its state, and resumes statelessly after you decide.

Knowledge & memory

Your documents, searched the right way

Upload PDFs, Office docs, markdown, or source code. Phlox parses, chunks, and embeds them into Qdrant, then retrieves with true hybrid search — a dense semantic vector and a sparse lexical vector per chunk, fused with RRF and reranked, returning numbered citations the model is instructed to cite.

  • Global knowledge base or per-conversation document scoping
  • Dependency-free sparse vectors and reranker work fully offline
  • SQLite stays the source of truth — the index can always be rebuilt
  • Cross-conversation memory recalls durable facts into every turn
1
ParsePDF · DOCX · TXT · MD · code
2
Chunk & embeddense + sparse vectors
3
Hybrid searchRRF fusion across both vectors
4
Rerankcross-encoder-ready seam
5
Citenumbered sources [n]
Built for teams

Multi-user, isolated, and accountable

Auth is on by default, data is scoped strictly per user, and every sensitive tool runs behind a permission gate you control.

🔐

Auth & SSO

Local accounts (bcrypt + JWT) or Microsoft Entra ID SSO. user / admin roles, strict per-user isolation — admins manage accounts but can't read others' content.

📦

Container sandbox

Run agent code in an ephemeral Podman/Docker container with CPU, memory, and PID limits plus network isolation — or a fast local subprocess for trusted single-user use.

🛡️

Per-tool permissions

Each tool is auto, ask, or deny. Mutating and execution tools default to ask; an Agent-mode toggle auto-approves for a single turn.

⚙️

Live admin config

Edit provider profiles, model pricing, resilience, and sandbox limits from an admin panel — applied without a restart. API keys are write-only and masked.

📊

Observability

Per-request structured logs, an optional OpenTelemetry tracing seam, and per-turn token/cost capture in a durable ledger.

💵

Departmental chargeback

A durable usage ledger outlives the accounts it tracks — a departed user's costs stay billable after their account is deleted. Usage by month × user × department × model, CSV-exportable.

Authentication settings showing Microsoft Entra ID single sign-on configuration
Entra ID SSO & local accounts
Sandbox limits panel with container memory, CPU, PID, and network controls
Container sandbox resource limits
Tools and permissions panel with per-tool auto, ask, and deny policies
Per-tool auto · ask · deny policies
The platform layer

An LLM gateway and cost ledger for the whole team

Beyond chat, Phlox is an OpenAI-compatible gateway with per-user API keys, live model pricing, and department-level chargeback — the governance layer that turns a chat app into shared infrastructure.

Usage and cost dashboard grouped by department, user, and model with per-month totals and CSV export
Usage & cost, grouped by month × department × user × model — exportable to CSV for finance.
API keys management panel showing per-user keys for the OpenAI-compatible gateway
Mint per-user API keys — call Phlox from any OpenAI SDK
Model pricing editor with input and output cost per million tokens for each model
Live model pricing — applied to new turns, no restart
Under the hood

Two clean processes

A FastAPI backend handles LLM orchestration, the agent harness, MCP, RAG, code execution, auth, and SQLite persistence. A React + Vite frontend renders the rich, streaming UI.

Frontend React + Vite + Tailwind
  • Zustand store — live streaming assembly
  • SSE stream parser for /api/chat
  • Tool cards, reasoning, inline artifacts
  • CSS-variable theme tokens
Backend FastAPI
  • Resumable agent loop + tool registry
  • Permission gate — the security seam
  • Providers: OpenAI-compatible & Bedrock
  • RAG · sandbox · workspace · MCP
  • SQLite source of truth + Qdrant index

In dev, Vite proxies /api to FastAPI. In production, FastAPI serves the built SPA from frontend/dist — one command to run the whole thing.

Make it yours

Eight themes, instant switching

A semantic CSS-variable token layer means themes change with no rebuild — and adding your own is two small edits.

Phlox Dark
Phlox Light
Fred Hutch
Hutch Night
Dark
Light
Sandstone
+ your own
Up and running in minutes

Quick start

Prerequisites: Python 3.11+ with uv, Node 18+, and a model provider — a local Ollama is the easiest.

1 · Backend
# from backend/
uv sync
cp config.yml.example config.yml   # set your provider
uv run uvicorn app.main:app --reload --port 8000
2 · Frontend
# from frontend/, separate terminal
npm install
npm run dev
# open http://localhost:5173

On Windows run both with ./scripts/dev.ps1; on macOS/Linux ./scripts/dev.sh. Auth is on by default with a seeded admin / adminchange it and set a real jwt_secret before sharing access.

Self-host your own AI assistant today

Open source under Apache 2.0. Clone it, point it at a model, and run.