How I built a 150K-line Android automation app — and what I learned the hard way

I'm Vasyl, an Android developer. A couple of years ago I got frustrated with how hard it is to automate repetitive tasks on mobile. On desktop you write a script. On Android you hit a wall.

So I built Automator — a macro and AI agent app. Here's what that actually looked like from the inside.

The core: Accessibility API

Android's Accessibility API lets you read screen content, tap elements, and enter text — no root needed. It sounds straightforward. It isn't.

Every app builds its UI differently. Some use standard Views, others use custom Canvas, Flutter, or React Native — which give you basically no element tree to work with. I ended up building a fallback lookup chain: first try contentDescription, then text, then coordinates via OCR.

Then there's the background process problem. Since Android 8, the OS aggressively kills background services. A macro scheduled for an hour later might run with no service alive to execute it. The fix: Foreground Service + wake lock management + restart recovery.

When Accessibility isn't enough: Device Owner API

For things like app blocking or network control, Accessibility just doesn't have the permissions. Device Owner does — but activating it without root requires ADB or a setup-time QR code, which most users have never heard of. I built the full activation flow into the app itself, with step-by-step guidance.

The tricky part isn't technical — it's trust. Device Owner touches sensitive operations. I spent a lot of time making sure nothing happens in the background without explicit user confirmation.

OCR for apps that ignore Accessibility entirely

Games and heavily custom UIs often render everything to a Canvas. There's no element tree — just pixels. I integrated ML Kit Text Recognition with a few optimizations to make it practical: scan only the relevant screen region, cache results for static elements, run async so the main thread stays unblocked.

It works well for most cases. Rotated text, animations, low contrast — those still trip it up. I added retry logic with different image preprocessing parameters for those.

On-device AI — no external APIs

The AI agent takes natural language instructions and executes them: "Every morning at 9, open Telegram, find the last message from Irina, copy it and email it to me."

Everything runs locally via LiteRT (Google's successor to TensorFlow Lite). No data leaves the device, works offline, no subscriptions. The migration from TFLite was smooth precisely because inference lives in its own isolated module — swapped the dependency, adjusted a few calls, done.

The tradeoff vs cloud LLMs is real — the model is less capable. But for intent understanding and breaking tasks into steps, it's enough. The user brings their own model: something lightweight and quantized for speed, or heavier for accuracy — their call.

The harder part was the action planner. The AI generates a JSON plan; a custom interpreter translates that into Accessibility API calls. Essentially a tiny virtual machine inside the app.

150K lines of Kotlin

The project grew feature by feature. At some point I looked up and saw 150,000 lines. Not scary — but it demands structure.

Nothing revolutionary: each major feature lives in its own module, modules communicate via events rather than direct calls, and every non-obvious decision has a comment explaining why. That last habit saves the most time — six months later you forget everything.

The built-in HTTP server

One feature I didn't expect to be popular: Automator runs a local HTTP server. You can trigger macros from a browser, another device on the same network, or integrate with Home Assistant. It turns your phone into a controllable IoT node.

Honest limitations

Every major Android release can break something. Android 14 and 15 both tightened Accessibility restrictions — I had to adapt each time.

Banking apps actively detect and block automation tools. Fragile UI means a moved button breaks a macro. Complex workflows still need some technical understanding from the user.

I'm not pretending this is solved. It's a hard space.

Want to try it?

The app is live on Google Play:
https://play.google.com/store/apps/details?id=com.automator.app

I'm running a closed beta — free for all testers, Google Play issues the license automatically. To join, I need to add your Gmail to the tester list. Just drop your Gmail address in the comments and I'll add you. Once added, you can install via:
https://play.google.com/apps/testing/com.automator.app

Looking for people who actually want to automate something real and tell me what's confusing. Happy to answer any technical questions in the comments — especially around Accessibility API, LiteRT on Android, or architecture for large mobile apps.

Say something nice to automator_android…

1

This resonates hard with Android utilities. I’m building Kinetic Override in a much narrower lane — no-root Android 15+ tap/swipe macro recording — and the hard parts are also permissions, real-device behavior, and timing edge cases rather than the first UI mockup.

herold33

·
3 days ago
·
1. 1
  
  Sounds like you know the pain firsthand 😄 Would love to hear how Kinetic Override handles timing edge cases — always useful to compare approaches. If you want to poke at Automator from the outside, drop your Gmail and I'll add you to the beta.
  
  automator_android
  
  ·
  8 hours ago
  ·