Self-hostable · Open source · Apache 2.0

A full-featured AI platform
you actually own.

Phlox is a self-hostable AI platform — an agentic tool-using harness, document RAG, code execution, an OpenAI-compatible gateway, and per-user cost accounting — running over any model provider: AWS Bedrock or any OpenAI-compatible endpoint, including fully local models.

View on GitHub Quick start →

Runs over

AWS Bedrock
OpenAI
Ollama
OpenRouter
vLLM
LiteLLM
LM Studio

localhost:5173

The Phlox chat interface showing a streaming conversation, tool calls, and an artifact panel

~40built-in agent tools

100%self-hostable & offline-capable

8built-in themes

6+model providers, one config

Bring your own model. Or run it all locally.

Named provider profiles cover AWS Bedrock and any OpenAI-compatible endpoint — OpenAI, LiteLLM, or a local runtime. Point Phlox at Ollama, LM Studio, or vLLM and the whole stack — chat and RAG embeddings — runs offline with no cloud API key. Switch profiles live, with a built-in connection tester.

Define as many provider profiles as you like, switch between them instantly
Local embeddings (e.g. nomic-embed-text) keep RAG fully offline
Edit profiles, pricing, and limits live — no server restart required

config.yml

default_profile: local-ollama
profiles:
  local-ollama:
    type: openai
    label: "Ollama (local)"
    endpoint: http://localhost:11434/v1
    api_key: ollama       # ignored by Ollama
    model: qwen3.6:35b
    supports_tools: true

Everything in one app

A complete assistant, not just a chat box

Phlox bundles the pieces you'd otherwise stitch together yourself — each one self-hosted and under your control.

💬

Streaming chat

Conversation history with rename, delete, search & export. Edit/regenerate messages, markdown with highlighted, copyable code, plus Mermaid diagrams and LaTeX math.

🤖

Agentic harness

The model uses tools in a loop — filesystem, shell, Python/Node execution, document search, plus planning, sub-agents, memory, and checkpoints — all in a sandboxed workspace.

🤝

Human-in-the-loop

Pause on sensitive tools, approve or deny, then resume. The run state is persisted, so approvals survive disconnects.

🧰

Code execution & artifacts

Run code with captured output and inline artifacts. A Workspace Files panel lets you browse and download everything the agent created.

📚

Documents & RAG

Upload PDF, DOCX, TXT, MD, or code. Hybrid dense + sparse search over Qdrant with reranking and citations, scoped globally or per conversation. Works offline.

🌐

Opt-in web search

A per-prompt composer toggle exposes web_search (zero-config ddgs or SearXNG) so the agent can discover current sources before fetching pages.

🧠

Cross-conversation memory

Durable facts are saved and semantically recalled across chats, so the assistant remembers you from one conversation to the next.

🖼️

Multimodal

Attach images to messages for vision models, persisted and replayed into the provider as image content parts.

🔌

MCP integration

Connect Model Context Protocol servers from the UI; their tools join the model's toolset automatically, no code required.

🚪

OpenAI-compatible gateway

Mint per-user API keys and call Phlox from any OpenAI SDK via /v1/chat/completions — with the same per-user cost accounting.

💵

Usage & cost accounting

Per-message token and cost in the UI, plus an admin chargeback view by month × user × department × model, with CSV export for finance.

🎨

Theming

Phlox Dark by default, with Light, Fred Hutch, Hutch Night, Sandstone and more — instant switching via a CSS-variable token system.

The agentic core

A real agent, not "chat that calls tools"

Each turn, the model works in a loop — calling tools, planning, and recovering — inside a per-conversation sandboxed workspace you can inspect, snapshot, and roll back.

01 Tool loop

Filesystem (read_file, write_file, edit_file, glob, grep), run_shell, execute_python / execute_node, and search_documents — one unified tool surface the model drives until the task is done.

02 Planning & sub-agents

update_todos keeps a visible plan; spawn_subagent runs a nested, ephemeral agent with a scoped toolset in the same workspace and returns a report.

03 Memory & checkpoints

save_memory persists durable facts across chats. Every workspace is a git repo that auto-snapshots after mutating tools, with one-click restore.

04 Approvals & permissions

Every tool has an auto / ask / deny policy. The loop pauses on ask, persists its state, and resumes statelessly after you decide.

Knowledge & memory

Your documents, searched the right way

Upload PDFs, Office docs, markdown, or source code. Phlox parses, chunks, and embeds them into Qdrant, then retrieves with true hybrid search — a dense semantic vector and a sparse lexical vector per chunk, fused with RRF and reranked, returning numbered citations the model is instructed to cite.

Global knowledge base or per-conversation document scoping
Dependency-free sparse vectors and reranker work fully offline
SQLite stays the source of truth — the index can always be rebuilt
Cross-conversation memory recalls durable facts into every turn

ParsePDF · DOCX · TXT · MD · code

Chunk & embeddense + sparse vectors

Hybrid searchRRF fusion across both vectors

Rerankcross-encoder-ready seam

Citenumbered sources [n]

Built for teams

Multi-user, isolated, and accountable

Auth is on by default, data is scoped strictly per user, and every sensitive tool runs behind a permission gate you control.

🔐

Auth & SSO

Local accounts (bcrypt + JWT) or Microsoft Entra ID SSO. user / admin roles, strict per-user isolation — admins manage accounts but can't read others' content.

📦

Container sandbox

Run agent code in an ephemeral Podman/Docker container with CPU, memory, and PID limits plus network isolation — or a fast local subprocess for trusted single-user use.

🛡️

Per-tool permissions

Each tool is auto, ask, or deny. Mutating and execution tools default to ask; an Agent-mode toggle auto-approves for a single turn.

⚙️

Live admin config

Edit provider profiles, model pricing, resilience, and sandbox limits from an admin panel — applied without a restart. API keys are write-only and masked.

📊

Observability

Per-request structured logs, an optional OpenTelemetry tracing seam, and per-turn token/cost capture in a durable ledger.

💵

Departmental chargeback

A durable usage ledger outlives the accounts it tracks — a departed user's costs stay billable after their account is deleted. Usage by month × user × department × model, CSV-exportable.

Authentication settings showing Microsoft Entra ID single sign-on configuration — Entra ID SSO & local accounts

Sandbox limits panel with container memory, CPU, PID, and network controls — Container sandbox resource limits

Tools and permissions panel with per-tool auto, ask, and deny policies — Per-tool `auto · ask · deny` policies

The platform layer

An LLM gateway and cost ledger for the whole team

Beyond chat, Phlox is an OpenAI-compatible gateway with per-user API keys, live model pricing, and department-level chargeback — the governance layer that turns a chat app into shared infrastructure.

Usage and cost dashboard grouped by department, user, and model with per-month totals and CSV export — Usage & cost, grouped by month × department × user × model — exportable to CSV for finance.

API keys management panel showing per-user keys for the OpenAI-compatible gateway — Mint per-user API keys — call Phlox from any OpenAI SDK

Model pricing editor with input and output cost per million tokens for each model — Live model pricing — applied to new turns, no restart

Under the hood

Two clean processes

A FastAPI backend handles LLM orchestration, the agent harness, MCP, RAG, code execution, auth, and SQLite persistence. A React + Vite frontend renders the rich, streaming UI.

Frontend React + Vite + Tailwind

Zustand store — live streaming assembly
SSE stream parser for /api/chat
Tool cards, reasoning, inline artifacts
CSS-variable theme tokens

Backend FastAPI

Resumable agent loop + tool registry
Permission gate — the security seam
Providers: OpenAI-compatible & Bedrock
RAG · sandbox · workspace · MCP
SQLite source of truth + Qdrant index

In dev, Vite proxies /api to FastAPI. In production, FastAPI serves the built SPA from frontend/dist — one command to run the whole thing.

Make it yours

Eight themes, instant switching

A semantic CSS-variable token layer means themes change with no rebuild — and adding your own is two small edits.

Phlox Dark

Phlox Light

Fred Hutch

Hutch Night

Dark

Light

Sandstone

+ your own

Up and running in minutes

Quick start

Prerequisites: Python 3.11+ with uv, Node 18+, and a model provider — a local Ollama is the easiest.

1 · Backend

# from backend/
uv sync
cp config.yml.example config.yml   # set your provider
uv run uvicorn app.main:app --reload --port 8000

2 · Frontend

# from frontend/, separate terminal
npm install
npm run dev
# open http://localhost:5173

On Windows run both with ./scripts/dev.ps1; on macOS/Linux ./scripts/dev.sh. Auth is on by default with a seeded admin / admin — change it and set a real jwt_secret before sharing access.

Go deeper

Self-host your own AI assistant today

Open source under Apache 2.0. Clone it, point it at a model, and run.

View on GitHub Quick start →

A full-featured AI platformyou actually own.

Bring your own model. Or run it all locally.

A complete assistant, not just a chat box

Streaming chat

Agentic harness

Human-in-the-loop

Code execution & artifacts

Documents & RAG

Opt-in web search

Cross-conversation memory

Multimodal

MCP integration

OpenAI-compatible gateway

Usage & cost accounting

Theming

A real agent, not "chat that calls tools"

01 Tool loop

02 Planning & sub-agents

03 Memory & checkpoints

04 Approvals & permissions

Your documents, searched the right way

Multi-user, isolated, and accountable

Auth & SSO

Container sandbox

Per-tool permissions

Live admin config

Observability

Departmental chargeback

An LLM gateway and cost ledger for the whole team

Two clean processes

Eight themes, instant switching

Quick start

Documentation

Architecture

Roadmap

Auth

Sandbox

Observability

API Gateway

MCP

Theming

Self-host your own AI assistant today

A full-featured AI platform
you actually own.