Skip to main content
← Back to Blog

Agents are helping us grow

QuickStock, Tincture, Maestro, and our agent fleet now report LLM token cost into one per-app rollup. Plus a Discord approval gate for our agents and a self-updating model registry.

Self-updating model lists

Hardcoded model IDs go stale fast. We stood up a model-discovery agent that queries each provider, picks the current model per family, and publishes a registry our apps read at runtime, with a bundled fallback for when the fetch fails. The chat model picker and the fleet's grantable-model list both source from it now.

Token auditing

We can now see exactly what every ChameleonLabs app spends on LLM tokens, broken down per product and accurate back to each app's launch. QuickStock, Tincture, Maestro, and our managed-agent fleet all report into one cross-app cost rollup.

Every app reports its own cost now

Until this week, only Maestro showed up in our "By App" cost view. We wired QuickStock and Tincture to report per-call token usage to ChameleonLabs over an HMAC-signed endpoint, then backfilled historic pricing so older spend is costed at the rates that were actually in effect at the time.

Getting there meant fixing two bugs that had been quietly hiding spend. Sub-app usage reports were firing a fetch without awaiting it, and on Amplify's Lambda runtime an un-awaited request gets frozen and dropped the moment the handler returns, so the reports never arrived.

A real approval gate for the agent fleet

Our managed agents can take real action, so "needs approval" had to become something a human actually clicks. Aegis intercepts high-risk verbs and turns them into a Discord prompt with Allow once / this session / always / Decline, parks the work, and resumes it on your answer. One-way webhooks couldn't carry buttons or take a response back, so this closes that gap.

Alongside it we replaced the flat admin gate on roughly thirty fleet routes with three permission tiers (platform-admin, host-admin, member), and added a visual workflow designer where you wire agents and read-only host operations into runnable, durable workflows and watch them execute live. We also tracked down an orchestrator bug where a sub-agent calling a custom tool would hang the whole delegation for about twelve minutes; the stream-routing guard was matching the wrong event type.

Keeping Maestro's answers fresh

We shipped a weekly librarian agent that keeps Maestro's RAG knowledge base honest. It reindexes the corpus in production, replays ten golden support questions to prove the answers still hold, and opens documentation PRs when it finds gaps. Maestro also gained a knowledge-base-first support persona that can file a support ticket and escalate to a human when it can't answer, and those unanswered questions feed the librarian's docs work, so the system gets better at answering over time.

We also caught that production reindexing had quietly been a no-op. Next.js only bundles files it sees imported, so the deployed route couldn't read our docs folder. It does now.

If you're a ChameleonLabs customer, most of this is plumbing you'll never see. It's the plumbing that lets us price features honestly and lets our agents act without someone hovering over them. Next up is widening cost reporting to the remaining apps and putting the per-app numbers in front of the team every day.

Maestro AI

Good afternoon, there