Case study - Personal project

PipelineAI - Autonomous multi-agent research

A system that decomposes a question into 4-6 sub-questions, runs researchers in parallel, has a critic review them, then synthesizes. Four Docker services, a live DAG, and a cost discipline that avoids billing surprises.

DAG stages: 4
Services orchestrated: 4
Cosine dedup threshold: 0.92
Critic rounds: 2 max

The brief

A single LLM agent is powerful but limited. Hand it a complex question and it charges down one direction, ignores contradictions, and hands you back a plausible but opaque draft. PipelineAI explores the other approach: decompose, parallelise, critique, synthesize.

Personal project to push multi-agent orchestration into a real distributed system (Go + Postgres + Redis + Next.js), not a single-thread Python script.

The challenge

When you fan out N researchers in parallel on the same question, you end up with duplicates, silent contradictions, and an LLM bill that explodes. You need to dedup findings, surface conflicts without averaging them away, and account for every token before drowning in an infinite loop.

And it must stay readable: the user wants to watch the pipeline reason, not stare at a spinner for 90 seconds.

The solution

A 4-stage DAG (Planner → Researchers x N → Critic → Synthesizer) governed by a hard cap: the critic can re-trigger an investigation, but never more than 2 rounds. Unresolved contradictions surface in the final report, never hidden.

Findings are embedded (Voyage 1024d) and deduplicated against the session via pgvector cosine >= 0.92. System prompts use Anthropic's prompt cache. Every call is tracked per tier (input / output / cache-read / cache-write) with a configurable spend alert.

The pipeline

Four stages, orchestrated by the Go backend with sync.WaitGroup for the parallel phase. Events are published to Redis pub/sub then streamed via SSE to the DAG visualizer on the frontend.

Planner
Stage 1
Decomposes the initial question into 4 to 6 independent sub-questions. A single Claude call, schema-validated output.
Researchers
Stage 2 (parallel)
N goroutines, each running its own loop: web_search -> extract claims -> embed. No locks: Postgres handles concurrency and pgvector dedups.
Critic
Stage 3
Cross-reads findings, detects contradictions, may trigger a targeted re-investigation. Capped at 2 rounds to avoid expensive loops.
Synthesizer
Stage 4
Assembles the final report: claims with sources, per-claim confidence score, unresolved contradictions surfaced explicitly.

Screenshot of the PipelineAI DAG visualizer showing the 4 pipeline stages mid-execution — fig. 01 - DAG visualizer mid-run

Design patterns

Goroutines + sync.WaitGroup for the parallel phase: no locks, Postgres handles concurrency
Semantic dedup via Voyage embeddings (1024d) + pgvector IVFFlat cosine >= 0.92
Prompt caching on every system prompt (planner, researchers, critic, synthesizer)
Per-tier token accounting (input / output / cache-read / cache-write) with configurable spend alert
SSE -> live DAG: the user watches each node change colour in real time
Contradictions surface, never averaged: the final report makes them explicit

Tech stack

Go 1.25 (API, orchestration, sync primitives)
Claude Sonnet 4.6 + web_search_20260209 tool
Voyage embeddings (1024 dimensions)
PostgreSQL + pgvector (IVFFlat, cosine)
Redis (pub/sub + job queue)
Next.js 16 + EventSource (SSE) + DAG visualizer
Docker Compose + GitHub Actions -> GHCR

Outcomes

Full pipeline shipped in a single day: 4 agents, 4 Docker services, CI/CD to GHCR, frontend with live DAG and SSE streaming.

Demonstrates a systems approach: multi-agent + Go concurrency + vector dedup + cost discipline. Not another LLM wrapper, but an orchestrator that makes reasoning transparent.