Skip to main content
← Back to BlogThis Week: Comic Page Layouts, Gemini Omni Video and Agent Hardening

This Week: Comic Page Layouts, Gemini Omni Video and Agent Hardening

Nine PRs merged: ImageAI gained comic-style page layouts, a layout CLI and Gemini Omni video with conversational editing, and we hardened the agent system behind ChameleonLabs.

Nine PRs merged this week, and six of them landed in ImageAI. Here's what actually shipped.

Comic-style page layouts in ImageAI

Layout The Fifth Fox ImageAI's Layout tab can now build real comic pages: non-rectangular panels (concave and curved shapes included), angled gutters, bleed and borderless frames, plus text tools for the comic itself. You can let the AI designer rough out a page from a description or author every panel by hand. We broke the work into five sub-projects and ran each one spec first, then plan, then test-driven subagents, with a review after every task and a full Opus review at the end. It all merged as a single PR (#26), which felt slightly absurd and also great.

Then we put a CLI on it (#27). The layout engine used to be GUI-only. Now three flags drive it directly: --layout-design asks an LLM to generate a layout project from a description, --layout-fill renders the panels and --layout-export writes the output. Design and fill don't even need Qt.

We also fixed the small stuff that makes a tool pleasant to live in (#30). The Layout tab now remembers your LLM model, content kind, style role, orientation and dropdown choices between sessions. That PR also killed a long-standing hang when running bare pytest. (A collection hang, the worst kind. Nothing tells you anything.)

If that sounds interesting, it's in ImageAI now.

Gemini Omni video, end to end

ImageAI picked up a fourth video provider: Gemini Omni (gemini-omni-flash-preview), driven through Google's Interactions API (#29). Text-to-video, image-to-video and conversational editing, at 720p and 24fps, 3 to 10 seconds with audio, in 16:9 or 9:16. We confirmed it working end to end against the live API before merging.

Conversational editing needed a trigger in the UI, so #31 added one: right-click an Omni-generated clip and pick "Refine (Omni)" to keep editing the same video across turns. And #33 closed every gap we found in an audit against Google's docs: up to three reference images per generation, video editing, task inference and URI delivery, plus matching refine flags on the CLI.

Agent-system hardening

On the ChameleonLabs side, we worked through both slates from our July 3rd agent-system improvement scan. The Do-now pass (#275) covered security seams, a reliability watchdog and Fable plumbing. Every finding was adversarially verified against the code before we changed anything, and it's already deployed to our ops host and running healthy with zero failed units. The Do-soon pass (#276) followed the next day: all 13 items, plus one more we found along the way. The biggest piece was consolidating shared Python code into one library, which grew from 3 modules with a single consumer to 7 modules used across the system. A workflow watchdog, deploy tooling and a Fable model registry rounded it out.

One last fix (#278): our press pipeline was logging 404 warnings for two personal repos during deploy smoke runs. Turned out the token was fine. The pipeline had copied a pinned repo list from another tool but not the logic that picks the right PAT for lelandg/* repos. Now it does.

That's the week.

Maestro AI

Good evening, there