Case study - Personal project
PipelineAI - Autonomous multi-agent research
A system that decomposes a question into 4-6 sub-questions, runs researchers in parallel, has a critic review them, then synthesizes. Four Docker services, a live DAG, and a cost discipline that avoids billing surprises.
- DAG stages
- 4
- Services orchestrated
- 4
- Cosine dedup threshold
- 0.92
- Critic rounds
- 2 max
The brief
A single LLM agent is powerful but limited. Hand it a complex question and it charges down one direction, ignores contradictions, and hands you back a plausible but opaque draft. PipelineAI explores the other approach: decompose, parallelise, critique, synthesize.
Personal project to push multi-agent orchestration into a real distributed system (Go + Postgres + Redis + Next.js), not a single-thread Python script.
The challenge
When you fan out N researchers in parallel on the same question, you end up with duplicates, silent contradictions, and an LLM bill that explodes. You need to dedup findings, surface conflicts without averaging them away, and account for every token before drowning in an infinite loop.
And it must stay readable: the user wants to watch the pipeline reason, not stare at a spinner for 90 seconds.
The solution
A 4-stage DAG (Planner → Researchers x N → Critic → Synthesizer) governed by a hard cap: the critic can re-trigger an investigation, but never more than 2 rounds. Unresolved contradictions surface in the final report, never hidden.
Findings are embedded (Voyage 1024d) and deduplicated against the session via pgvector cosine >= 0.92. System prompts use Anthropic's prompt cache. Every call is tracked per tier (input / output / cache-read / cache-write) with a configurable spend alert.
The pipeline
Four stages, orchestrated by the Go backend with sync.WaitGroup for the parallel phase. Events are published to Redis pub/sub then streamed via SSE to the DAG visualizer on the frontend.
Planner
Stage 1Decomposes the initial question into 4 to 6 independent sub-questions. A single Claude call, schema-validated output.
Researchers
Stage 2 (parallel)N goroutines, each running its own loop: web_search -> extract claims -> embed. No locks: Postgres handles concurrency and pgvector dedups.
Critic
Stage 3Cross-reads findings, detects contradictions, may trigger a targeted re-investigation. Capped at 2 rounds to avoid expensive loops.
Synthesizer
Stage 4Assembles the final report: claims with sources, per-claim confidence score, unresolved contradictions surfaced explicitly.

Design patterns
- Goroutines + sync.WaitGroup for the parallel phase: no locks, Postgres handles concurrency
- Semantic dedup via Voyage embeddings (1024d) + pgvector IVFFlat cosine >= 0.92
- Prompt caching on every system prompt (planner, researchers, critic, synthesizer)
- Per-tier token accounting (input / output / cache-read / cache-write) with configurable spend alert
- SSE -> live DAG: the user watches each node change colour in real time
- Contradictions surface, never averaged: the final report makes them explicit
Tech stack
- Go 1.25 (API, orchestration, sync primitives)
- Claude Sonnet 4.6 + web_search_20260209 tool
- Voyage embeddings (1024 dimensions)
- PostgreSQL + pgvector (IVFFlat, cosine)
- Redis (pub/sub + job queue)
- Next.js 16 + EventSource (SSE) + DAG visualizer
- Docker Compose + GitHub Actions -> GHCR
Outcomes
Full pipeline shipped in a single day: 4 agents, 4 Docker services, CI/CD to GHCR, frontend with live DAG and SSE streaming.
Demonstrates a systems approach: multi-agent + Go concurrency + vector dedup + cost discipline. Not another LLM wrapper, but an orchestrator that makes reasoning transparent.