YC company builds AI agent swarm that beats OpenAI, Google, Anthropic & Perplexity on research benchmarks

by Trevor

Hi everyone,

We recently built + launched Spine Swarm.

You prompt and a team of AI agents spins up, works in parallel on a visual canvas, and delivers finished outputs: market reports, financial models, slide decks, interactive prototypes.

Users are using Spine for:

product research i.e. PRDs, should we build x, prototyping
competitive analysis
building outreach sequences
creating 50 page cited docs, pitch decks & memos
customer feedback analysis
complex multi-step tasks

The agent swarm just ranked #1 on Google DeepMind DeepsearchQA Benchmark ahead of Perplexity, Claude, OpenAI and Gemini - the hardest benchmark measuring how well AI answers complex research questions. Accurate deliverables = less busy work for you.

🙏 Our Ask

Check us out here (www.getspine.ai?utm_source=indiehackers) and run a task. It's fully free to get started.

Then let us know in the comments what you build or think!

Thanks everyone,

Trevor

on March 12, 2026

Say something nice to TrevorBuilds…

Post Comment

4

Congrats on the launch, Trevor! The parallel agent swarm on a visual canvas is a compelling approach — feels like a real step beyond single-threaded chat. Going to give it a spin for competitive analysis.

shenoah

·
2 months ago
·
Reply
3

Interesting concept. The idea of multiple agents working in parallel on complex research tasks is pretty compelling. Curious how you handle coordination between the agents to keep outputs consistent?

ya_app

·
2 months ago
·
Reply
1. 2
  
  Ya, curious if they each have an inbox like agent-mail
  
  ozten
  
  ·
  2 months ago
  ·
  Reply
  1. 1
    
    That would actually make sense for coordination between agents.
    
    ya_app
    
    ·
    2 months ago
    ·
    Reply
2

The contracts piece around payment failure clauses is often overlooked. If your SaaS has recurring billing, make sure your ToS is clear on what happens when a card fails -- automatic suspension vs. grace period vs. dunning. Learned this the hard way.

DivenRastdus

·
2 months ago
·
Reply
2

Benchmarks are one thing. Cost per run in production is another. Nobody talks about what it actually costs to run a swarm of agents at scale. That's where most teams get surprised.

NovaShips

·
2 months ago
·
Reply
2

The consistency angle is underrated. Most people quit right before things start compounding. Your workflow breakdown is gold - thanks for sharing the messy details, not just the polished outcome.

notronhq

·
2 months ago
·
Reply
2

The speed of AI tooling right now is insane.
One thing I keep noticing, though, is that while people are building better agents, the distribution problem is still huge — especially for freelancers and small builders trying to find real clients.

I’ve been exploring this while building a small project called ClientRadar that aggregates freelance opportunities from different sources so people don’t have to hunt across platforms all day.

Curious to see how AI agents will change not just building products, but also how people actually find work and customers.

FarhadAsbaghi

·
2 months ago
·
Reply
1. 1
  
  Spot on. The barrier to building has completely collapsed, but the barrier to distribution and client acquisition hasn't budged. ClientRadar sounds like exactly the kind of practical, pain-killer tool freelancers actually need right now instead of just another wrapper. Aggregating the noise from all those different job boards into one dashboard tackles a very real bottleneck.
  
  It’s funny you mention the whole freelance/client workflow, because I'm exploring exactly how to solve the next phase of that problem with PainScan. Finding the client is half the battle; actually identifying and locking in on their core problems is the other.
  
  I think you're right—the next wave of AI isn't just going to be agents doing the heavy lifting of building, but agents acting as the connective tissue that handles the business development and client communication side. Dropping a link when you have one!
  
  sprdt
  
  ·
  2 months ago
  ·
  Reply
2

Hi, congrats on the launch. How does this compare to claude and claude cowork?

Snoren

·
2 months ago
·
Reply
1. 1
  
  Great question. Output is better because the agents are leveraging 300+ models, not just Claude, so they use the best model for the task.
  
  Additionally the visual workspace so you can audit agent work & make changes.
  
  TrevorBuilds
  
  ·
  2 months ago
  ·
  Reply
1

Agent swarms working in parallel is the next frontier. The coordination overhead between agents is the hard part though. Curious how they handle conflicting outputs.

AIMadeTools

·
a month ago
·
Reply
1

Beating OpenAI and Google on the DeepsearchQA benchmark is a massive feat. Most AI agents struggle with staying accurate during complex research, so ranking #1 there is impressive. Congrats on the launch, Trevor!

StageAuto

·
2 months ago
·
Reply
1
Really interesting direction, Trevor — the visual canvas + parallel agents combo feels like a genuine step forward from linear chat.

What stands out to me isn’t just the benchmark result, it’s the workflow shift. Most tools still force users into a single-threaded thinking model, even for problems that are naturally parallel (like competitive analysis or market mapping). Breaking that into simultaneous agent workstreams is exactly how a real research team would operate.

A couple of things I’m especially curious about after reading through this:
- Coordination layer: How are you preventing overlap or contradictions between agents? Is there a central planner/reviewer agent that reconciles outputs, or is it more emergent behavior from the swarm?
- Cost vs quality: As others mentioned, routing across 300+ models sounds powerful, but also potentially expensive. Are you optimizing for cost per task dynamically, or is it more quality-first right now?
- Output reliability: Benchmarks are great, but the real win is whether I can take a 50-page report and not spend an hour fixing citations or structure. Have you measured “edit time saved” or something similar?
That said, the visual transparency is a big deal. Being able to see agents working (instead of waiting on a black box) builds a lot more trust — and probably makes debugging way easier to.

Going to test this for a competitive analysis workflow. If it can consistently produce something close to client-ready output, that’s where this becomes seriously valuable.
david7373

·
2 months ago
·
Reply
1

This is exactly the kind of transparency the community needs. The dunning email sequence timing -- day 0, 3, 7 -- seems to match what works best based on what I've read. Did you experiment with different intervals?

DivenRastdus

·
2 months ago
·
Reply
1

This is next-level for AI productivity. What stands out isn’t just that Spine Swarm ranks #1 on benchmarks it’s that it orchestrates agents in parallel to handle multi-step, high-cognitive tasks that normally take teams days. The ability to generate PRDs, financial models, or 50-page reports autonomously turns AI from a helper into a full research engine. The key insight here is that true leverage comes from combining multiple specialized agents, each with domain focus, into a coordinated workflow, rather than relying on a single generalist model. If the swarm can maintain accuracy at scale, it redefines what a solo founder or small team can execute without hiring extra headcount.

benj_mrtn

·
2 months ago
·
Reply
1

Multi-agent on a visual canvas is a smart UX choice — makes the process feel less like a black box. pureaireview has been covering tools in this space lately, this one seems worth a closer look. What does the output quality look like for financial models specifically?

AImarketfeed

·
2 months ago
·
Reply
1

The multi-agent approach makes total sense for complex research workflows. We found a similar principle applies to website building - instead of one AI trying to handle both design and content simultaneously, breaking it into specialized tasks works much better.

For research, your parallel canvas sounds like a game-changer. For WordPress sites, we built Kintsu.ai to work with EXISTING sites through natural conversation - handles themes, plugins, content, all through vibe coding. Different domain, same insight: specialized AI beats generalist every time.

Going to test Spine for competitive analysis. The visual approach to seeing agents collaborate is brilliant.

kintsuai

·
2 months ago
·
Reply
1

The parallel agent approach is interesting. One question that comes up a lot with multi-agent systems: how do you handle conflicting information between agents? For example, if two agents research the same topic but find contradictory data points, does the system have a reconciliation layer or does it surface both and let the user decide? That's usually where these systems either shine or fall apart in practice.

docat0209

·
2 months ago
·
Reply
1

Congrats on the launch, Trevor — the concept sounds really interesting. Having multiple agents work in parallel on complex research tasks could save a lot of time, especially for things like market research and competitive analysis. The visual canvas approach also sounds like a nice way to see how the work is being done.

I’ll check it out and try running a task. Curious to see how the outputs compare with other research tools in real-world use.

david7373

·
2 months ago
·
Reply
1

Parallel agents are impressive but the cost question @joshatDHF raised is the elephant in the room for anyone running multi-model workflows day to day. When you're routing tasks across Claude, OpenAI, and whatever else, the token burn compounds fast and it's genuinely hard to track where the money is going without something watching it in real time.

I run a menu bar tool on Mac that aggregates token usage across all my providers (OpenAI, Claude, Gemini, OpenRouter, Cursor, Copilot) so I can see the cost ticking up as I work. Changed how I think about which model to use for what. Before that I'd just get surprised by the bill at month end.

The benchmark results are compelling, but for indie hackers the real question is always going to be: does the output quality justify the token spend per task? Would love to see a cost-per-quality comparison alongside the accuracy benchmarks.

JohnMadison

·
2 months ago
·
Reply
1

This is a fascinating approach. The multi-agent architecture mirrors how specialized human research teams work - rather than one generalist, you get specialists with different retrieval strategies collaborating. As someone building with AI agents daily for marketing work, the benchmark gains are exciting. I am curious how coordination overhead scales and whether the gains hold on tasks with messier success criteria than research benchmarks. Would love to see ablation studies on agent count vs better prompting on a single model.

MachineMktgShai

·
2 months ago
·
Reply
1

Really cool approach. The visual canvas + parallel agents feels like a big usability upgrade over CLI-based swarms.

Curious what you found hardest to get right:
(1) keeping citations consistent across sources, or
(2) preventing agents from drifting into redundant work? And do you see Deep search QA-style benchmarks correlating with real-world deliverables like market reports/slide decks?

marzun9620

·
2 months ago
·
Reply
1

Spine AI honestly looks like a total game-changer, especially if you're exhausted from wrestling with the terminal just to get an AI agent running. The site pitches a multi-agent "swarm" that works together in a visual, browser-based canvas, completely eliminating the steep learning curve of traditional CLI tools. What really caught my eye is their comparison chart—they specifically call out OpenClaw, highlighting that instead of dealing with manual integrations, self-hosting on your own rig, and debugging single agents, Spine handles everything out of the box with access to hundreds of models. For anyone who has spent hours troubleshooting local bot setups, the promise of jumping straight into a collaborative visual dashboard to orchestrate complex workflows without any of the backend headache is a huge breath of fresh air.

sprdt

·
2 months ago
·
Reply
1

The visual canvas showing agents working in parallel is a smart trust-building mechanism. You can actually see what's happening instead of waiting for a black box to finish.

I'm building in a similar philosophical space but for email — a Mac app called Drafted that uses AI to pre-draft email replies. The hardest design decision was the same one you're navigating: how much autonomy to give the AI.

For research, parallel agents make sense because the output is a report you'll review anyway. But for email, the stakes are different — a wrong tone in a reply to a client can cost you the relationship. So we went with zero autonomy on sending: the AI drafts, the human reviews and sends. Every time.

The pattern I keep seeing across AI products is that transparency scales trust. Your visual canvas does it for research. Our confidence scoring (High/Medium/Low on each draft) does it for email. The users who stick around aren't the ones who want full automation — they're the ones who want to go faster while staying in control.

What's your retention look like for power users vs casual ones? Curious if you see the same split.

alpha_compadre

·
2 months ago
·
Reply
1

Interesting approach to multi-agent collaboration. I’m curious how you manage coordination between agents to avoid duplicated work or conflicting outputs.

I recently launched a small Excel tool that automatically generates fantasy football lineups and learned how tricky it can be to balance automation with good decision logic. Would love to know how you solved that on the AI side.

fantasygoalhub

·
2 months ago
·
Reply
1

AI agents collaborating on research could change how people gather information. Do you think this approach will become common in future AI tools?

IlovePDFApp

·
2 months ago
·
Reply
1

The parallel agent approach on a visual canvas is a genuinely different UX from single thread research tools — being able to see multiple agents working simultaneously makes the process legible rather than just waiting for an output to appear.
Ranking above Perplexity and Claude on DeepsearchQA is a bold claim but if the cited 50 page docs hold up in practice that's a real differentiator for consultants and analysts who currently spend days on research compilation.
Going to try it for competitive analysis — that's the use case I'm most curious about for early stage founders.

jayesh_somani_

·
2 months ago
·
Reply
1

The swarm approach makes sense — specialized agents collaborating beats one generalist model trying to do everything. What I want to know is latency and cost at scale. For anyone building on top of these systems, that's where the rubber meets the road.

joshatDHF

·
2 months ago
·
Reply
1

Interesting approach with the multi-agent swarm on a visual canvas. The parallel execution model makes sense for research tasks where you can decompose the problem into independent sub-queries.

Curious about the citation accuracy — that's where most AI research tools fall short in practice. Ranking well on DeepsearchQA is impressive, but the real test is whether the cited sources actually support the claims in the output. Have you measured hallucination rates on citation accuracy specifically?

Also, for the competitive analysis use case — how do you handle information freshness? One challenge I've noticed with AI research tools is they can confidently present outdated data alongside current data without distinguishing between them.

WilliamWangAI

·
2 months ago
·
Reply
1

This looks like a powerful platform for tackling complex multi-step tasks. I’m curious to see how well the AI agent swarm handles research and prototyping in real scenarios. Definitely checking it out to explore more ideas!

logancarter

·
2 months ago
·
Reply
1

Great discussion! I just launched MarketLens, a financial data API with 256 endpoints for stocks, crypto, forex, and technical analysis. Free tier available with 500 API calls per day. Would love feedback from this community!

marketlns

·
2 months ago
·
Reply
1

“Something I’ve noticed: two startups with similar traction can get completely different reactions from investors depending on how the founder is perceived publicly.

Does anyone intentionally work on their authority/credibility as a founder?”

themarogee

·
2 months ago
·
Reply
1

Interesting concept — the visual canvas with parallel AI agents sounds powerful.

Curious about one thing: how do you coordinate the agents internally?
Is it more like a planner–executor model or independent agents collaborating on subtasks?

Would love to try it for product research workflows.

codewithishwar

·
2 months ago
·
Reply
1

Congrats on the launch. The visual canvas makes me think of the early hypertext days. There were many cool UIs that didn't make it long term. I payed a couple hundred dollars for this super mallable canvas with infinite zoom and hyperlinks. Like hypercard, but even more messy.

Do you think the visual canvas is a transitional UI or it will be the long term winner?

ozten

·
2 months ago
·
Reply
1

Amazing

Joshualawi

·
2 months ago
·
Reply
1

Not fully free. I have to login first :(

StefanJVA

·
2 months ago
·
Reply
1. 1
  
  Running the swarm costs money on our end so we need to authenticate to track usage. Your data is private and we don't sell any of it, if that makes you feel more comfortable.
  
  TrevorBuilds
  
  ·
  2 months ago
  ·
  Reply
1

Multi-agent systems are starting to feel like the next big shift in how we build AI tools. Instead of one model trying to do everything, you can break work into specialized agents that collaborate.

Curious how much of the performance improvement here comes from orchestration vs model choice.

backendrescue

·
2 months ago
·
Reply
1

The visual canvas for agent outputs is a smart call. Seeing agents work in parallel on a spatial layout gives you something a linear chat never can: you can spot when two agents are covering the same ground or missing a gap between them.

Curious about the prompt side of this. When a user says "do competitive analysis on X," how much decomposition happens before the agents start? Does the swarm decide on its own what sub-tasks to split into, or is there a planning step that structures the request first?

That decomposition step is where I keep seeing the biggest leverage. A vague prompt like "research my competitors" has at least five different dimensions buried in it: who the audience is, what format the output should take, which constraints matter, how deep to go. Most people mix all of that into one sentence and hope the model figures it out.

I've been building flompt (https://github.com/Nyrok/flompt) to tackle that input side. It's a visual prompt builder that splits prompts into 12 typed semantic blocks (role, constraints, output format, etc.) and compiles to XML. Open source, 75+ stars. Different layer than what you're doing but same underlying bet: structure beats freeform.

Would be interesting to see what happens when you feed a Spine swarm a well-structured prompt vs a raw one-liner. Bet the output quality gap is massive.

Nyrok

·
2 months ago
·
Reply
1

This is interesting. The idea of multiple agents working in parallel actually makes sense for research tasks.

One thing I’ve learned though is that the real test is not the benchmark — it’s whether the output saves people time in real work. If the reports and analysis are actually usable without a lot of editing, that’s where the real value is.

bhavin_allinonetools

·
2 months ago
·
Reply
1

This makes sense

Iam_jaja

·
2 months ago
·
Reply
1

Multi-agent parallelism is the right direction. The bottleneck I keep seeing isn't the AI capability — it's reliable access to the environments agents need to operate in. Document generation is solved. The next frontier is agents that can actually navigate real software.

QCCBotCloudPhone

·
2 months ago
·
Reply
1

I found a little bug in ui. But overall it is very comfortable

shalimooos

·
2 months ago
·
Reply
1. 1
  
  What was the bug?
  
  TrevorBuilds
  
  ·
  2 months ago
  ·
  Reply
1

Impressive work. Multi agent systems are clearly where things are heading.

Curious about one thing: how do you manage coordination between agents to avoid conflicting outputs or redundant work? That’s usually the hardest part with agent swarms.

Also interesting that you’re using a visual canvas. Does that make the workflow easier to debug or guide compared to traditional chat interfaces?

Will definitely try it out.

muhammedyesilmen

·
2 months ago
·
Reply
1

Interesting concept. The idea of multiple agents working in parallel on complex research tasks is pretty compelling. Curious how you handle coordination between the agents to keep outputs consistent?

JonySmith

·
2 months ago
·
Reply
1

The visual canvas idea sounds really interesting for complex workflows. Did users actually find it easier than a traditional chat interface?

Promptaflow

·
2 months ago
·
Reply
1

This comment was deleted 2 months ago.

DeletedUser

·
2 months ago