Most automation tools still need you to set up workflows, connect APIs, or write code.
I wanted something different — you just describe what you want in natural language, and it does it.
So I built Flex. It takes full control of your mouse, keyboard, and apps to execute complete workflows autonomously. Works on Windows and Mac.
Real example:
"Check the latest email about March revenue, update the Google Sheet, and post the chart to Discord."
It opened Gmail, read the email, updated the spreadsheet, generated the chart, and posted it to Discord — in 90 seconds, zero clicks from me.
Watch the 90-second demo: https://www.loom.com/share/04c27c6a5e1d48a29e38d8b2c85c7622
Private beta is open — secure your spot now:
👉 https://flexagent.app
Under the hood — the part I'm most proud of:
The obvious approach (Claude Computer Use, etc.) relies on continuous screenshot-based vision. It's slow, expensive, and error-prone with coordinates.
I built a 3-Tier Execution Engine that makes vision a last resort:
Tier 1 — OS Accessibility Tree: Direct hook into the OS UI tree. Zero-latency, zero vision tokens. This handles most tasks instantly.
Tier 2 — Set-of-Mark (SoM): If the accessibility tree is blocked (custom canvases, games, etc.), I take a screenshot and number every UI element. The model picks a number — no coordinate guessing, no misclicks.
Tier 3 — Raw Vision: Only if both fail, the model calculates raw X,Y coordinates itself. This is what most agents start with. I use it as a last resort.
On top of that:
I'm 16, built this entirely solo. No team, no funding, no co-founder.
Happy to answer questions about the build.
16 and shipping something like this is genuinely impressive. The 3-tier execution engine insight is the right call — continuous screenshot vision is one of those approaches that sounds clean until you're debugging why the cursor landed 3px off at 2am.
What's your current fallback when Tier 1/2 can't resolve the action? Does it escalate gracefully or just fail hard?
Thanks! That 2am 3px bug is exactly why I built it this way 😂
It escalates gracefully. If Tier 1 & 2 fail, it drops to Tier 3, which leverages Claude's native Computer Use. It actually has a very low error rate, but the irony here is that Claude’s first and only approach is my absolute last resort.
If even Tier 3 fails or the UI state doesn't change, the system doesn't hard crash. The planner realizes it's stuck and invokes a built-in ask_user tool, pausing execution to ask for human clarification.
I’m actively refining these edge cases for the private beta right now. If you're into stress-testing and breaking things, you should grab a spot on the waitlist (https://flexagent.app/) and see if you can make it fail!
Are you building in the agent space yourself right now?