1
2 Comments

I'm 16 and built a desktop app that controls your PC autonomously — you type a task, it executes everything.

Most automation tools still need you to set up workflows, connect APIs, or write code.

I wanted something different — you just describe what you want in natural language, and it does it.

So I built Flex. It takes full control of your mouse, keyboard, and apps to execute complete workflows autonomously. Works on Windows and Mac.

Real example:
"Check the latest email about March revenue, update the Google Sheet, and post the chart to Discord."

It opened Gmail, read the email, updated the spreadsheet, generated the chart, and posted it to Discord — in 90 seconds, zero clicks from me.

Watch the 90-second demo: https://www.loom.com/share/04c27c6a5e1d48a29e38d8b2c85c7622

Private beta is open — secure your spot now:
👉 https://flexagent.app


Under the hood — the part I'm most proud of:

The obvious approach (Claude Computer Use, etc.) relies on continuous screenshot-based vision. It's slow, expensive, and error-prone with coordinates.

I built a 3-Tier Execution Engine that makes vision a last resort:

Tier 1 — OS Accessibility Tree: Direct hook into the OS UI tree. Zero-latency, zero vision tokens. This handles most tasks instantly.

Tier 2 — Set-of-Mark (SoM): If the accessibility tree is blocked (custom canvases, games, etc.), I take a screenshot and number every UI element. The model picks a number — no coordinate guessing, no misclicks.

Tier 3 — Raw Vision: Only if both fail, the model calculates raw X,Y coordinates itself. This is what most agents start with. I use it as a last resort.

On top of that:

  • Mouse moves on Bezier curves with variable speeds (human-like, bypasses bot detection)
  • Typing is character-by-character
  • 50+ built-in tools: terminal execution, file management, isolated browser
  • Ultimate Mode: complete human interface takeover across any app

I'm 16, built this entirely solo. No team, no funding, no co-founder.

Happy to answer questions about the build.

https://flexagent.app

on April 8, 2026
  1. 1

    16 and shipping something like this is genuinely impressive. The 3-tier execution engine insight is the right call — continuous screenshot vision is one of those approaches that sounds clean until you're debugging why the cursor landed 3px off at 2am.

    What's your current fallback when Tier 1/2 can't resolve the action? Does it escalate gracefully or just fail hard?

    1. 1

      Thanks! That 2am 3px bug is exactly why I built it this way 😂

      It escalates gracefully. If Tier 1 & 2 fail, it drops to Tier 3, which leverages Claude's native Computer Use. It actually has a very low error rate, but the irony here is that Claude’s first and only approach is my absolute last resort.

      If even Tier 3 fails or the UI state doesn't change, the system doesn't hard crash. The planner realizes it's stuck and invokes a built-in ask_user tool, pausing execution to ask for human clarification.

      I’m actively refining these edge cases for the private beta right now. If you're into stress-testing and breaking things, you should grab a spot on the waitlist (https://flexagent.app/) and see if you can make it fail!

      Are you building in the agent space yourself right now?

Trending on Indie Hackers
I built a WhatsApp AI bot for doctors in Peru — launched 3 weeks ago, 0 paying customers, and stuck waiting for Meta to approve my app User Avatar 45 comments Fixing broken scrapers instead of working on my actual product. So I made it my problem. User Avatar 43 comments I built an open-source PII masking layer for LLM APIs — early traction, looking for design partners User Avatar 33 comments From broke and burned out as a PM, to launching my SaaS and optimizing my health User Avatar 27 comments How to see revenue problems before they get worse User Avatar 27 comments I kept starting projects and dropping them. So I built a system that wouldn’t let me User Avatar 22 comments