Lightweight conversational AI stack with:
- FastAPI backend (LangGraph orchestration + universal OpenAI-compatible model client)
- Local-first memory system (short-term + long-term semantic memory)
- Custom React frontend (no Docker UI dependency)
Frontend (React + Vite) -> FastAPI -> LangGraph -> UniversalChat -> OpenAI-compatible backend
- Chat UI with local conversation history (browser localStorage)
- Model selection from backend
/api/v1/models(fallback/v1/models) - Prompt scene selector + optional system prompt input
- Phase 3 tool calling with planner + safe executor (kb + tickets)
- Memory panel:
- list/add/delete memories
- semantic memory search
- Trace drawer for last assistant response metadata
- OpenAI-compatible chat endpoints remain available:
POST /api/v1/chat/completionsPOST /v1/chat/completions
backend/ FastAPI app + LangGraph + memory modules
frontend/ React + Vite + TypeScript frontend
tests/ Pytest backend tests
docs/MEMORY_SYSTEM.md Memory architecture/spec
docs/FRONTEND.md Frontend architecture/extension notes
run_all.py One-command dev runner (backend + frontend)
- Python 3.11+
- Node.js 18+ (includes npm)
- Conda env
servicebotactivated
- Install backend dependencies:
pip install -r requirements.txt- Install frontend dependencies:
cd frontend
npm install
cd ..- Configure backend env:
copy .env.example .envSet at least:
MODEL_NAMEBASE_URLAPI_KEYEMBEDDING_MODEL(recommended:text-embedding-v4)EMBEDDING_API_KEY(or reuseAPI_KEY)
From project root:
python run_all.pyThis starts:
- Backend:
http://localhost:8000 - Frontend:
http://localhost:3000
Frontend reads:
VITE_API_BASE_URL(defaulthttp://localhost:8000)
Example:
copy frontend\\.env.example frontend\\.env- Health:
GET /health - Models:
GET /api/v1/models(fallbackGET /v1/models) - Chat:
POST /api/v1/chat/completions - Memory:
POST /api/v1/memoryGET /api/v1/memory?user_id=...DELETE /api/v1/memory/{memory_id}POST /api/v1/memory/search
- Prompt scenes:
GET /api/v1/prompt-scenes
What it does:
- Adds intent routing and ReAct-style tool planning before final response generation.
- Executes local-first tools safely (allowlist, timeout, rate limit).
- Builds final prompt dynamically from memory + tool context + user message.
- Keeps OpenWebUI compatibility (
/v1/models,/v1/chat/completions) while running tools internally.
Enable/disable tools:
- Set
TOOLS_ENABLED=true|falsein.env.
Configure tool storage paths:
KB_FILE_PATH=./data/kb/faq.jsonTICKET_DB_PATH=./data/tickets/tickets.db
Add a new tool:
- Add schema models in
backend/tools/schemas.py. - Implement and register the tool in
backend/tools/builtin.py. - Add tool name to
TOOLS_ALLOWLIST. - Extend
backend/tools/planner.pyrouting logic if auto-selection is needed. - Add focused tests in
tests/.
Create ticket:
curl -X POST http://localhost:8000/v1/chat/completions ^
-H "Content-Type: application/json" ^
-d "{\"model\":\"qwen-plus\",\"user\":\"demo_user\",\"messages\":[{\"role\":\"user\",\"content\":\"I want to open a support ticket for my internet not working\"}]}"KB policy query:
curl -X POST http://localhost:8000/v1/chat/completions ^
-H "Content-Type: application/json" ^
-d "{\"model\":\"qwen-plus\",\"user\":\"demo_user\",\"messages\":[{\"role\":\"user\",\"content\":\"What is your refund policy?\"}]}"Inspect trace with internal chat endpoint:
curl -X POST http://localhost:8000/api/v1/chat ^
-H "Content-Type: application/json" ^
-d "{\"user_id\":\"demo_user\",\"message\":\"I want to open a support ticket for my internet not working\"}"- Keep OpenWebUI pointing to this backend OpenAI-compatible base URL.
- Start chat in browser and ask:
I want to open a support ticket for my internet not workingWhat is your refund policy?
- Backend internally plans and executes tools; OpenWebUI remains unaware of internal tool mechanics.
- If OpenWebUI sends
stream=true, backend currently returns a single content chunk and[DONE].
- Start all services:
python run_all.py- Check backend health:
curl http://localhost:8000/health- Check models:
curl http://localhost:8000/api/v1/models- Open browser:
http://localhost:3000
- Chat test:
- Select model
- Send a user message
- Confirm assistant response appears
- Memory test:
- Add memory in Memory panel
- Search memory in Memory panel
- Confirm stored items list updates
Run backend tests:
pytestFocused suites:
pytest tests/test_api.py tests/test_graph.py tests/test_memory_api.py tests/test_memory_store.py -qcd frontend
npm run build- Port conflict:
- Ensure
8000and3000are free beforepython run_all.py.
- Ensure
- Backend up but no models in UI:
- Verify
GET /api/v1/modelsreturns data.
- Verify
- Memory panel empty:
- Verify
.envhas memory enabled and backend logs show memory services initialized.
- Verify
- Frontend cannot reach backend:
- Check
frontend/.envVITE_API_BASE_URLand browser devtools network errors.
- Check