When you build a platform for other creators, analytics isn't optional. It's a promise.
If someone puts their work on Rubies Unleashed, they deserve to know what's happening to it. Not just a view count. Real data. Where their audience is coming from. Whether visitors are converting to downloads. What percentage of viewers are saving their project. Whether anyone is actually coming back.
I shipped the full analytics system last week. Here's everything: the decisions, the architecture, the tradeoffs, and what I'd do differently.
The fast version: install Plausible or Fathom, configure a few custom events, pipe the data into a dashboard. Done in two days.
The problem: Plausible doesn't know what "Explore page" means. It doesn't know the difference between a view that came from the Editor's Choice spotlight vs a view that came from a search result on the platform. It sees pageviews and custom events. It doesn't see the internal topology of my application.
To get source-level attribution, I'd have to instrument every navigation point manually, pass custom properties with every event, and then build a custom dashboard on top of it. At that point I'm doing the same work, just with worse data locality.
So I built it in Postgres, where the analytics data lives next to the content data.
Two event tables. One for views, one for downloads. Each row stores a project ID, a hashed visitor fingerprint derived from IP and user agent (never stored in plain text), a generated date column for deduplication, the traffic source, and whether the viewer was authenticated or a guest.
The unique constraint is on project, viewer hash, and day. One view per visitor per project per day. Repeat visitors within a day don't inflate counts.
One key gotcha: the viewer day column is generated by the database from the timestamp. Do not try to insert it manually. Postgres will throw a constraint error. I learned this the hard way.
View count increment happens in a DB trigger on the events table, not inside the insert function. This keeps the denormalized counter on the projects table consistent even if something bypasses the function.
Every view is attributed to one of eight sources:
| Source | When it fires |
|---|---|
| explore | Clicked from the Explore page |
| search | Clicked from search results |
| profile | Clicked from a creator's public profile |
| feed | Clicked from the home dashboard feed |
| spotlight | Clicked from the GiantRuby spotlight feature |
| editors_choice | Clicked from Editor's Choice or the weekly digest email |
| external | Arrived from outside the platform |
| direct | No referrer, no from param |
Every internal navigation point injects a ?from= parameter into the project URL. On the project page, a source detector reads this first, then falls back to document.referrer. After capture, the param is removed from the URL so it doesn't persist in the browser history.
Rather than running complex aggregation queries at request time, I built a Postgres view that pre-aggregates everything: total views, downloads, wishlist count, 7-day and 30-day breakdowns, unique viewer counts, audience split between authenticated and guest users, per-source traffic counts, and JSON arrays of daily data for sparkline charts.
The JSON arrays feed directly into SVG sparklines on the frontend. No chart library, just path generation from the data points.
The API route checks that the requesting user owns the project before returning data. Row level security enforces this at the database level too.
Per-project:
Creator overview:
The all-time sparkline currently uses last 30 days as a proxy. The right fix is monthly buckets. It's labeled misleadingly and I know it.
Trusted domain tracking: Steam, Google Play, and App Store links skip the safety warning modal, which means they also skip download tracking. I need a background beacon that fires before the external redirect. Not shipped yet.
There's also a file in my codebase that creates its own Supabase client instance instead of importing the shared singleton. It triggers a MultipleGoTrueClient warning in the console. Low priority but annoying.
Anyone else building analytics in-house? Curious how you handle deduplication and source attribution.