Streaming chat
Conversation history with rename, delete, search & export. Edit/regenerate messages, markdown with highlighted, copyable code, plus Mermaid diagrams and LaTeX math.
Phlox is a self-hostable AI platform — an agentic tool-using harness, document RAG, code execution, an OpenAI-compatible gateway, and per-user cost accounting — running over any model provider: AWS Bedrock or any OpenAI-compatible endpoint, including fully local models.
Named provider profiles cover AWS Bedrock and any OpenAI-compatible endpoint — OpenAI, LiteLLM, or a local runtime. Point Phlox at Ollama, LM Studio, or vLLM and the whole stack — chat and RAG embeddings — runs offline with no cloud API key. Switch profiles live, with a built-in connection tester.
nomic-embed-text) keep RAG fully offlinedefault_profile: local-ollama
profiles:
local-ollama:
type: openai
label: "Ollama (local)"
endpoint: http://localhost:11434/v1
api_key: ollama # ignored by Ollama
model: qwen3.6:35b
supports_tools: true
Phlox bundles the pieces you'd otherwise stitch together yourself — each one self-hosted and under your control.
Conversation history with rename, delete, search & export. Edit/regenerate messages, markdown with highlighted, copyable code, plus Mermaid diagrams and LaTeX math.
The model uses tools in a loop — filesystem, shell, Python/Node execution, document search, plus planning, sub-agents, memory, and checkpoints — all in a sandboxed workspace.
Pause on sensitive tools, approve or deny, then resume. The run state is persisted, so approvals survive disconnects.
Run code with captured output and inline artifacts. A Workspace Files panel lets you browse and download everything the agent created.
Upload PDF, DOCX, TXT, MD, or code. Hybrid dense + sparse search over Qdrant with reranking and citations, scoped globally or per conversation. Works offline.
A per-prompt composer toggle exposes web_search (zero-config ddgs or SearXNG) so the agent can discover current sources before fetching pages.
Durable facts are saved and semantically recalled across chats, so the assistant remembers you from one conversation to the next.
Attach images to messages for vision models, persisted and replayed into the provider as image content parts.
Connect Model Context Protocol servers from the UI; their tools join the model's toolset automatically, no code required.
Mint per-user API keys and call Phlox from any OpenAI SDK via /v1/chat/completions — with the same per-user cost accounting.
Per-message token and cost in the UI, plus an admin chargeback view by month × user × department × model, with CSV export for finance.
Phlox Dark by default, with Light, Fred Hutch, Hutch Night, Sandstone and more — instant switching via a CSS-variable token system.
Each turn, the model works in a loop — calling tools, planning, and recovering — inside a per-conversation sandboxed workspace you can inspect, snapshot, and roll back.
Filesystem (read_file, write_file, edit_file, glob, grep), run_shell, execute_python / execute_node, and search_documents — one unified tool surface the model drives until the task is done.
update_todos keeps a visible plan; spawn_subagent runs a nested, ephemeral agent with a scoped toolset in the same workspace and returns a report.
save_memory persists durable facts across chats. Every workspace is a git repo that auto-snapshots after mutating tools, with one-click restore.
Every tool has an auto / ask / deny policy. The loop pauses on ask, persists its state, and resumes statelessly after you decide.
Upload PDFs, Office docs, markdown, or source code. Phlox parses, chunks, and embeds them into Qdrant, then retrieves with true hybrid search — a dense semantic vector and a sparse lexical vector per chunk, fused with RRF and reranked, returning numbered citations the model is instructed to cite.
[n]Auth is on by default, data is scoped strictly per user, and every sensitive tool runs behind a permission gate you control.
Local accounts (bcrypt + JWT) or Microsoft Entra ID SSO. user / admin roles, strict per-user isolation — admins manage accounts but can't read others' content.
Run agent code in an ephemeral Podman/Docker container with CPU, memory, and PID limits plus network isolation — or a fast local subprocess for trusted single-user use.
Each tool is auto, ask, or deny. Mutating and execution tools default to ask; an Agent-mode toggle auto-approves for a single turn.
Edit provider profiles, model pricing, resilience, and sandbox limits from an admin panel — applied without a restart. API keys are write-only and masked.
Per-request structured logs, an optional OpenTelemetry tracing seam, and per-turn token/cost capture in a durable ledger.
A durable usage ledger outlives the accounts it tracks — a departed user's costs stay billable after their account is deleted. Usage by month × user × department × model, CSV-exportable.
auto · ask · deny policiesBeyond chat, Phlox is an OpenAI-compatible gateway with per-user API keys, live model pricing, and department-level chargeback — the governance layer that turns a chat app into shared infrastructure.
A FastAPI backend handles LLM orchestration, the agent harness, MCP, RAG, code execution, auth, and SQLite persistence. A React + Vite frontend renders the rich, streaming UI.
/api/chatIn dev, Vite proxies /api to FastAPI. In production, FastAPI serves the built SPA from frontend/dist — one command to run the whole thing.
A semantic CSS-variable token layer means themes change with no rebuild — and adding your own is two small edits.
Prerequisites: Python 3.11+ with uv, Node 18+, and a model provider — a local Ollama is the easiest.
# from backend/
uv sync
cp config.yml.example config.yml # set your provider
uv run uvicorn app.main:app --reload --port 8000
# from frontend/, separate terminal
npm install
npm run dev
# open http://localhost:5173
On Windows run both with ./scripts/dev.ps1; on macOS/Linux ./scripts/dev.sh.
Auth is on by default with a seeded admin / admin —
change it and set a real jwt_secret before sharing access.
System map, request lifecycle, module guide — start here.
What's done and what's next across Tiers 1–5.
Local accounts, roles, isolation, and Entra ID SSO setup.
Local vs Podman/Docker container code execution.
Token usage/cost, structured logs, OpenTelemetry tracing.
OpenAI-compatible keys and /v1/* endpoints.
Connecting Model Context Protocol servers.
The theme token system and adding new themes.
Open source under Apache 2.0. Clone it, point it at a model, and run.