I'm 16 and built a desktop app that controls your PC autonomously — you type a task, it executes everything.

Most automation tools still need you to set up workflows, connect APIs, or write code.

I wanted something different — you just describe what you want in natural language, and it does it.

So I built Flex. It takes full control of your mouse, keyboard, and apps to execute complete workflows autonomously. Works on Windows and Mac.

Real example:
"Check the latest email about March revenue, update the Google Sheet, and post the chart to Discord."

It opened Gmail, read the email, updated the spreadsheet, generated the chart, and posted it to Discord — in 90 seconds, zero clicks from me.

Watch the 90-second demo: https://www.loom.com/share/04c27c6a5e1d48a29e38d8b2c85c7622

Private beta is open — secure your spot now:
👉 https://flexagent.app

Under the hood — the part I'm most proud of:

The obvious approach (Claude Computer Use, etc.) relies on continuous screenshot-based vision. It's slow, expensive, and error-prone with coordinates.

I built a 3-Tier Execution Engine that makes vision a last resort:

Tier 1 — OS Accessibility Tree: Direct hook into the OS UI tree. Zero-latency, zero vision tokens. This handles most tasks instantly.

Tier 2 — Set-of-Mark (SoM): If the accessibility tree is blocked (custom canvases, games, etc.), I take a screenshot and number every UI element. The model picks a number — no coordinate guessing, no misclicks.

Tier 3 — Raw Vision: Only if both fail, the model calculates raw X,Y coordinates itself. This is what most agents start with. I use it as a last resort.

On top of that:

Mouse moves on Bezier curves with variable speeds (human-like, bypasses bot detection)
Typing is character-by-character
50+ built-in tools: terminal execution, file management, isolated browser
Ultimate Mode: complete human interface takeover across any app

I'm 16, built this entirely solo. No team, no funding, no co-founder.

Happy to answer questions about the build.

https://flexagent.app

Yousef Saleh

on April 8, 2026

Say something nice to yousefsaleh_dev…

Post Comment

1

16 and shipping something like this is genuinely impressive. The 3-tier execution engine insight is the right call — continuous screenshot vision is one of those approaches that sounds clean until you're debugging why the cursor landed 3px off at 2am.

What's your current fallback when Tier 1/2 can't resolve the action? Does it escalate gracefully or just fail hard?

AmandaBrown

·
3 months ago
·
Reply
1. 1
  
  Thanks! That 2am 3px bug is exactly why I built it this way 😂
  
  It escalates gracefully. If Tier 1 & 2 fail, it drops to Tier 3, which leverages Claude's native Computer Use. It actually has a very low error rate, but the irony here is that Claude’s first and only approach is my absolute last resort.
  
  If even Tier 3 fails or the UI state doesn't change, the system doesn't hard crash. The planner realizes it's stuck and invokes a built-in ask_user tool, pausing execution to ask for human clarification.
  
  I’m actively refining these edge cases for the private beta right now. If you're into stress-testing and breaking things, you should grab a spot on the waitlist (https://flexagent.app/) and see if you can make it fail!
  
  Are you building in the agent space yourself right now?
  
  yousefsaleh_dev
  
  ·
  3 months ago
  ·
  Reply