Here's a number I didn't want to believe the first time I saw it. Most freelancers, per 2025 industry surveys, bill about 65% of the hours they actually work. The best ones hit 85 or 90. For someone billing $75 an hour at 30 billable hours a week, the gap between average and good is somewhere between $13,000 and $32,000 a year. It isn't clients haggling. It isn't scope creep. It's leakage, and almost all of it happens in the space between the apps freelancers use to run their business.
I know this because I'm a developer, I've read the studies, I've nodded at the statistics, and I still had four separate tools open to answer one question: how much did I actually work this week.
Todoist for what I said I'd do.
Toggl for what I actually timed.
Google Calendar for what I'd committed to.
A spreadsheet for what I was going to bill.
Each of those tools, individually, was working correctly. That's the part that took me a long time to see. Toggl was recording the minutes I told it to record. Todoist was tracking the tasks I typed in. The spreadsheet was doing the arithmetic I fed it. None of the errors lived inside any single tool. All of them lived in the gaps between.
Every Friday I spent about 45 minutes reconciling. Did the hours I logged in Toggl match the tasks I checked off in Todoist? Did my calendar reflect what I'd actually worked on, or just what I'd planned? Is this billing total right? The answer was always "mostly." Which is a terrible answer for a billing number. "Mostly right" compounded across fifty weeks becomes a specific, measurable amount of money I did not earn.
The two places the leak reliably shows up are worth naming, because once you see them you cannot unsee them.
The first is time I worked but never started a timer for. A four-minute email reply to a client on Tuesday morning. A fifteen-minute "quick question" call. The moment I switched contexts to answer a Slack ping from a different project. Individually these look trivial. 2025 data puts manual time-tracking at roughly 70 to 80% accuracy versus 90 to 98% for automated capture, which means about 15 to 20% of every working day goes uncounted, and the uncounted slice is not in one place. It's crumbs under every rug.
The second is time I did track, but never attributed to anything billable. A thirty-minute rabbit hole researching a library I ended up not using. Context-rebuilding after a meeting. Template writing. The bucket of "not quite this project, not quite that one." It shows up in Toggl as a blob labeled "misc" and it never makes it onto a client's invoice because by Friday I cannot honestly say what it was for. 2025 reporting shows freelancers lose 15 to 40% of their billable hours to exactly this kind of categorization drift, and 31% of freelancers report losing measurable income each year because the categorization got messy.
I built Flowly because the version of me that spent Fridays reconciling four apps was, reliably, 20% wrong about my own income. Not because the apps were bad. They were excellent. Because the reconciliation was happening in my head, on a Friday afternoon, from memory, and memory has a specific decay curve that nobody should be basing their income on.
Flowly puts tasks and time tracking in the same place from the start. Write a task, start a timer, stop when you're done, and the hours attach themselves to the task automatically. The weekly view shows what you actually worked on, sorted by project, without a Friday ritual. The analytics compare planned time against real time, so the "misc" bucket stops being a bucket and starts being a question with an answer.
None of that is magic. The magic is that there's one source of truth instead of four, and the error that used to live in the gaps has nowhere to live anymore.
Here's the number I'd ask any freelancer to check before deciding whether a unified workspace is worth anything to them. For one full week, track your time as carefully as you can, then reconcile it at the end of the week against your calendar and your task list. Look at the delta. If it's under 5%, your current stack is working. If it's 15 or 20%, which is where most freelancers land, that gap is a specific amount of money that keeps leaving your account every month, and closing it is cheaper than most people assume.
If you've actually run this check on yourself, I'd like to know what number you landed on. The 15 to 20% figure comes from industry surveys. Real numbers from real freelancers tend to be more useful.
flowly.run. Free tier. 14-day Pro trial, no card required.
The Friday reconciliation ritual is painfully familiar. I've been doing the same thing for two years — Toggl, Notion, calendar, then a spreadsheet to tie it together. The part that hit hardest is that none of the tools were broken. The leakage wasn't a software problem, it was a gaps problem. That's a much harder thing to name and an even harder thing to fix without changing the whole system.
"None of the tools were broken" is exactly the thing that makes it so hard to act on. If Toggl was giving you wrong numbers you'd switch to a different timer. But when the error lives between four correct tools, there's nothing obvious to fix. The system feels like it's working right up until Friday afternoon when you're trying to remember what that two-hour block on Tuesday actually was.
The 15-20% leakage figure is the one I needed to see written down. I knew it was happening but I'd never done the math on what it actually costs annually. Running the numbers for my own rate and hours — it's uncomfortable. The 'misc' bucket in Toggl has been quietly eating my income for three years and I called it 'just how freelancing works.'
"Just how freelancing works" is the most expensive phrase in a freelancer's vocabulary. The leakage feels like friction until you multiply it by 50 weeks and an hourly rate. Then it stops feeling like friction and starts feeling like a specific number you've been leaving on the table. The uncomfortable part is that the money didn't go anywhere dramatic — it just evaporated in four-minute emails and unattributed rabbit holes.
That “everything works but the system doesn’t” problem is real.
Feels like the loss isn’t from bad tools, but from having to mentally reconcile multiple sources of truth.
Curious — do you think this is mainly a tooling problem, or more of a behavior/discipline problem?
Tooling, mostly — but it's easy to misread as discipline because the person doing the reconciling is the most visible part of the failure.
When four correct tools produce one wrong answer, the natural diagnosis is "I need to be more organized." The actual diagnosis is "I built a system that requires me to do the work the tools won't do for each other." Discipline can't close a structural gap, it just masks it temporarily.
That said, behavior is part of it at the edges — the four-minute email you don't start a timer for is a habit problem. But the Friday reconciliation ritual is pure architecture. No amount of discipline fixes a system with four sources of truth.
That’s a great way to frame it.
Feels like discipline gets blamed because it's visible, but the real issue is invisible system design.
People try to fix themselves instead of fixing the structure.
Exactly. And the feedback loop makes it worse — when the system fails, the person experiences it as a personal failure. So they try harder, add more discipline, maybe buy a better planner. The structure stays broken and now they also feel bad about it.
The system never gives you evidence that it's the problem. That's what makes it so sticky.
leakage in the gaps - I hit this hard last year building my own PM tools. three agents doing their jobs while silently duplicating work because I never spec’d the handoffs between them. fixing the apps wasn’t the answer.
The handoff problem is exactly the same shape. Each agent performing correctly while the system produces wrong output — the error isn't in any node, it's in the edges nobody designed.
The difference with agents is the failure mode is quieter. A human doing Friday reconciliation at least knows something feels off. Agents just keep going.
What finally made the duplication visible?
quieter failure is the worst part. human dropped handoffs show up as friction - someone's frustrated, you find out in Slack. agents just proceed confidently to the wrong place and nobody flags it.
Confident wrongness is so much harder to catch than frustrated wrongness. Friction is a signal. Smooth execution in the wrong direction is invisible until you look at the output and trace it back.
How far down the wrong path did it usually get before you caught it?
Losing $13k/year with 4 working apps is such a good problem framing. Curious — was the fix more about the billing logic or the monitoring itself?
Neither, really — that's the counterintuitive part. The apps didn't need new logic and there was nothing to monitor. The fix was removing the gap between them entirely.
When tasks and time live in the same place the reconciliation just disappears. There's no billing logic to fix because the hours attach to the task the moment you stop the timer. No monitoring because there's nothing to reconcile after the fact.
The problem looked like a data problem. It was actually a structure problem.
"The error living in the gaps" is a killer line.
I had a similar wake-up call a few months ago. I spent nearly three weeks rebuilding my dashboard because churn was spiking and I was convinced my UI was the problem. Ngl, I felt like an idiot when I finally checked the raw payment logs.
The product was fine.
Almost a third of those "lost" users hadn't actually quit. Their credit cards had just failed or expired, and I was too busy obsessing over heatmaps to notice the plumbing was leaking. It’s wild how we jump to "I need more features" when the real issue is just boring, invisible billing gaps.
We spend so much time on the "active" part of the business that we completely ignore the silent decay in the system architecture. And honestly? If you aren't separating your "I hate this app" churn from your "my card failed" churn, you're making product decisions based on ghost data.
When was the last time you actually filtered your churn numbers to see how many people left against their will?
Three weeks on heatmaps when the answer was in the payment logs — that's the same trap as my Friday reconciliation. The visible problem gets all the attention because it feels like the problem. The boring infrastructure failure sits there quietly being the actual problem.
The ghost data point is the one that stings. If you're making feature decisions based on churn that includes involuntary cancellations, you're optimizing against a signal that was never real. You might be building the wrong things for years and never know why they don't move the needle.
I haven't filtered Flowly's churn that way yet. Doing it this week.
what happened next?
That's a pretty open question — what part are you curious about?
The framing of "leakage" vs. "errors" is genuinely clever — it reframes the problem from tools being broken to tools having gaps between them. That's a much more honest diagnosis.
As a builder who also wears multiple hats, the hidden overhead in reconciling data across tools is massively underestimated. The 15–20% figure resonates; I've seen the same pattern. Excited to follow Flowly's journey — the insight that most people are in denial about this number is a powerful hook.
Denial is the right word. The leakage doesn't feel like a problem because nothing is visibly broken — and that's exactly what makes it expensive. You can't fix what you haven't named.
The multi-hat context makes it worse too. Every role switch is another reconciliation the tools don't do for you.
What's the gap that costs you the most time right now?
The mostly right line stung because I've felt it too. For me it was with market research — mostly sure an idea has demand. Built a small tool to validate ideas faster, but reading this makes me think I should look at my own time tracking gaps too.
Really appreciate how you named the enemy: leakage. What's one small change you made that cut the biggest chunk of that 20%?
Stopping the Friday reconciliation entirely and replacing it with a rule: never stop a timer without attaching it to a task first. That single constraint forced attribution to happen in the moment when the context was still fresh instead of four days later when it was gone.
It didn't eliminate the leakage completely but it moved the decision to the right time. The "misc" bucket shrank because I stopped giving myself permission to defer the attribution question.
What does your validation tool surface that gut feel misses?
This hit me hard — especially the part about “the error living in the gaps between tools.”
I’ve always suspected the leakage wasn’t inside Toggl/Todoist/Calendar, but in the reconciliation jump my brain tries to do at the end of the week.
The 15–20% loss range actually feels right. For me, the biggest leak is the stuff that never gets timed at all — 3-minute emails, context switching, Slack pings.
Curious: when you ran your own weekly delta calculation, what number did you get before building Flowly?
Love the clarity in this post. The psychology of “Friday memory decay” is real.
My own delta before building Flowly was somewhere around 18 to 20% consistently. The biggest slice wasn't the dramatic leakage — it was exactly what you described, the three-minute emails and context switching that never got a timer started because starting a timer felt like more friction than the task itself.
The psychology of not timing small things is its own problem. The mental cost of "should I start a timer for this four-minute email" is sometimes higher than just doing the email. So you do the email, and the four minutes disappears.
That's actually why the Flowly quick-add is designed the way it is — lower the friction of creating a task and starting a timer to the point where it costs less than the decision not to.
What's your biggest source of untracked time — the small tasks or the context switching between projects?
For me it’s more the context switching between projects.
The small tasks exist, but they feel almost negligible in the moment. The real loss is when I switch contexts — even if it’s just a quick interruption — because it takes time to fully get back into the original flow state.
That “re-entry cost” is where most of the untracked time seems to hide for me.
The re-entry cost is the hidden tax nobody measures because it doesn't show up as a discrete time block. The interruption takes three minutes but the cost is the twenty minutes it takes to fully rebuild context afterward. That twenty minutes usually goes into "misc" or just disappears entirely.
The crumbs framing breaks down a bit for context switching because it's not really a crumb — it's more like a tax on every transition. The more projects you're running in parallel, the higher the total tax, and it compounds in a way that four-minute emails don't.
Does Flowly solve that specific leak? Partially. Keeping tasks and time in the same place means when you switch projects you at least have a record of where you were. The context rebuilding still happens but you're not starting from memory alone.
Got it — the way you described the context-switching tax really resonated.
Makes sense why Flowly helps with the re-entry even if it can’t remove the mental reset entirely.
Thanks for taking the time to explain — learned a lot from this thread.
None of the errors lived inside any single tool all of them lived in the gaps between. That’s the sentence that explains half of all productivity problems and most billing problems. The reconciliation ritual you’re describing is a symptom of a system design problem, not a discipline problem. Most people treat the Friday 45 minutes as evidence they need to be more organized. You correctly identified it as evidence the tools don’t share a source of truth. The crumbs under every rug framing for untracked time is the most accurate description of that problem I’ve read. It’s not one big leakage event you’d notice. It’s dozens of four-minute moments that individually look too small to matter and collectively become $13,000.
"Asking memory to do accounting" and now "the error living in the gaps" — you've been the sharpest commenter in this thread and I mean that genuinely.
The system design framing is the one I wish I'd led with. Most productivity advice treats the Friday ritual as a habit problem. Buy a better planner, be more disciplined, set reminders. None of that works because the problem isn't behavior, it's architecture. Four correct tools with four separate sources of truth will always produce reconciliation debt, regardless of how disciplined the person using them is.
The crumbs framing came from trying to explain why the leakage feels invisible until you measure it. No single four-minute email feels like a problem. The pattern across fifty weeks is where the number lives.
What do you do?
That framing came from recognizing the same pattern in what I’m building. The invisible leakage problem where nothing looks broken because each individual piece works is exactly what happens on the user communication side too. Founders build great products and tell nobody. No single missed update kills retention. The pattern across fifty users who churned quietly is where the number lives.
I’m building ReleaseLog changelog and roadmap tool for indie founders. Same instinct of fixing the gap rather than patching the symptom. tryreleaselog.com
What’s the hardest part to get right with Flowly, the data model or the behavior change of getting people to actually consolidate?
The behavior change, by a significant margin.
The data model has hard problems but they're solvable. You cannot design your way out of a habit that took three years to form. People already have a system — fragmented and leaky but theirs. The switching cost feels real before the benefit does, and most people never run the delta check that would make the leakage concrete.
Same question back: is the harder problem getting founders to write the updates, or getting users to actually read them?
Getting users to read is the harder problem, and it’s underrated because founders assume publishing is the finish line. You write the update, hit publish, feel productive. Whether anyone opened it is a different question most never ask.
The writing side is solvable with enough friction reduction if the tool meets you where the thought already exists, rough notes become a real update in two minutes. But you can’t friction-reduce your way into someone’s inbox habits. The founders who retain readers are the ones whose updates feel like signal not noise, every single time. One irrelevant update and the next one gets skimmed.
The delta check framing is exactly right for your problem too. People don’t feel the leakage until you make the number concrete. How are you thinking about surfacing that moment for new Flowly users?
That signal-to-noise point is underrated. One irrelevant update doesn't just get skimmed — it changes how the reader classifies everything that comes after it. The trust is harder to rebuild than it was to lose.
On surfacing the moment: the current approach is the delta check built into onboarding. Track one week, then show the gap between logged hours and what's actually attributable. Most people see 15-20% and that's when it stops being abstract. The number does the selling that the feature list doesn't.
The harder version of your problem is that the gap is invisible until someone churns. At least with time tracking the leak is in data that exists — you just have to look at it. How do you make the cost of a missed update concrete before the user is already gone?
That’s the hardest version of the problem and I don’t have a perfect answer yet. The churn is silent by definition you never get the email that says “I left because I didn’t know you’d fixed the thing that was frustrating me.”
The closest thing I have to a delta check is subscriber open rates on changelog updates. If someone who submitted a feature request opens the email announcing it shipped, that’s a signal the loop closed. If they don’t, the update may as well not have happened for that user. The problem is you’re measuring after the fact rather than surfacing the gap in advance.
The honest version is that the cost only becomes concrete in retrospect you look at churned users and realise some percentage never opened a single update. At that point it’s too late for them but it’s useful data for everyone still active. I’m still working out how to make that visible before someone’s already halfway out the door.
Your delta check in onboarding is smarter because you’re creating the moment of realisation before the user has decided anything. I need an equivalent probably something like showing a founder how many of their active users have never seen a changelog entry.
That last idea is the one. How many active users have never seen a single update — that's the number that makes the cost concrete before anyone churns. It's the same mechanic as the delta check: you're not telling them there's a problem, you're showing them a number that makes the problem undeniable.
The retrospective data on churned users is useful but it's a postmortem. The active user version is actionable. Someone looks at that number and immediately knows what to do next.
The silent churn problem might not be fully solvable in advance, but that metric gets you as close as the data allows.
That's the feature. You just described it more clearly than I had it in my own head the number of active users who've never seen a single update, surfaced in the dashboard before anyone churns. Not a postmortem, not a retrospective, an active signal that tells you the communication gap is open right now. Going on the roadmap today. This whole conversation has been more useful than a week of solo thinking. What does your equivalent leading indicator look like in Flowly is the delta check the one number or are there others behind it?
Is there anywhere else I can follow you on, like x? Would love to follow the journey
The delta check is the entry point but the more useful number over time is the ratio of tracked hours to billed hours per project. The delta tells you the system is leaking. The per-project ratio tells you where.
Some projects consistently absorb unbilled time and you don't see it until you're looking at both numbers together. That's the one that tends to change how people price.
Glad the conversation was useful — same on my end, genuinely. You can find me at @flowly_run on X. Would like to see where the active user metric lands once it ships.
Curious how Flowly handles edge cases like async work or tiny interruptions that don’t feel worth starting/stopping a timer for. Thats usually where most systems quietly break down. Either way solid breakdown.
Flowly is built around one honest timer at a time—when you switch tasks it stops the previous block and starts the new one so nothing overlaps. If you forgot to stop or were away from the keyboard, you can stop with a custom end time so the log matches real work, not idle time. What it doesn’t try to do is auto-capture lots of tiny one-off minutes without a timer—those still need a quick start/stop or a manual trim on stop. We’re not pretending to fix every crumb of untracked time; we’re trying to keep the numbers you do log trustworthy.
The "misc" bucket problem is real. The issue isn't discipline — it's that attribution decisions made at the end of the week are being made by a brain that's already moved on. You're asking memory to do accounting.
Ran the check you described once. My delta was 22%. The majority wasn't untracked time — it was tracked time I couldn't honestly attribute to anything billable by Friday. The hours existed, the label didn't.
One question: how does Flowly handle the case where one block of work genuinely touches two projects? That's where my attribution always broke down — not the clean cases, but the 40-minute session that was half client A, half client B.
"Asking memory to do accounting" is the sharpest line in this comment and probably the best description of the Friday reconciliation problem I've read.
On the split-session question: right now Flowly handles it by letting you stop a timer and start a new one on a different task mid-session. So the 40 minutes becomes two timers — 20 minutes on client A's task, 20 on client B's. It's still a manual split decision but you're making it in the moment when you actually know what you just did, not on Friday when you're guessing.
The harder case you're describing is when the work genuinely overlaps — a conversation that touched both projects simultaneously. That one doesn't have a clean answer yet. Honest about that. The current approach is to attribute it to whichever project drove the conversation and note it. Not perfect but better than a "misc" bucket with no context at all.
What was your previous system for handling the split cases before you gave up and called it misc?
Honest answer: I didn't have one. The split decision happened at timer-stop, which meant it happened under pressure to get back to work. The path of least resistance was always "misc" with a vague note I'd never read again.
The in-the-moment split you described is actually the right architecture. The problem with Friday reconciliation isn't just memory decay — it's that by then you've lost the context that made the split obvious. Two minutes after finishing the call you know exactly which project drove it. Four days later you're guessing.
The simultaneous overlap case might not need a clean solution. Attributing to the driving project and noting it is probably the honest answer — trying to split it algorithmically would just move the guessing into a formula.
"Asking memory to do accounting" followed by "trying to split it algorithmically would just move the guessing into a formula" — you've diagnosed the problem and the wrong solution in the same thread. That's useful.
The in-the-moment split working better than Friday reconciliation is the core insight behind Flowly's architecture. Two minutes of context beats four days of memory every time.
The simultaneous overlap case I'll leave as "attribute to the driver and note it" for now. Clean enough to be honest, simple enough to actually do. The alternative is a complexity tax that makes the tool harder to use than the problem it's solving.
What’s interesting is you both landed on the same constraint from two sides.
“In the moment works, Friday doesn’t” isn’t just about memory decay. It’s showing that the clarity needed to label the work only exists at the point the decision is made.
Once that moment passes, you’re no longer assigning time. You’re reconstructing context, and that’s where it breaks.
So the split issue isn’t really about overlap or attribution edge cases. It’s that the work itself isn’t framed in a way that survives beyond the moment it happens.
The system isn’t losing accuracy later. It’s exposing that the original decision context wasn’t durable enough to carry forward.
"The original decision context wasn't durable enough to carry forward" is a better framing than anything I used in the post.
The way I'd been thinking about it was memory decay — the information existed, it just degraded over time. What you're describing is sharper: the information never existed in a form that could survive the moment. It was context-dependent from the start, and context isn't something you can store and retrieve later. You can only use it while it's live.
That reframe changes what the right fix actually looks like. If the problem is memory decay, you solve it with better recall — more detailed labels, richer notes, stricter naming conventions. If the context was never durable to begin with, none of that works. The only real solution is to make the decision at the moment the context exists, because nothing you write down later reconstructs what it felt like to be inside that work.
Which is why the Friday ritual was always going to fail regardless of how disciplined the person doing it was. You can't reconcile what was never captured in a transferable form.
Exactly.
Once you see that the context doesn’t survive the moment, it changes what you try to optimize for.
You’re not improving recall anymore.
You’re deciding what needs to be captured while the decision is still live.
Different system entirely.
The part about the error living in the gaps between correct tools is the clearest explanation of a problem I've had for years but couldn't name.
Once you see it that way the solution also becomes clearer. You can't fix the gaps by making the individual tools better.
Ran the check. One week, careful tracking, then reconciled. Delta was 18%. I've been doing this for four years and somehow never actually measured it before.
Four years of 18% is a number worth sitting with. The check only stings once — after that it's just information.
The 'misc' bucket description is painfully accurate. I have a Toggl entry from last Tuesday called 'stuff' that is 2.5 hours long and I genuinely cannot tell you what it was.
"Stuff" is doing a lot of work for 2.5 hours. By Friday that entry is already fiction.
Did the check you suggested. Tracked carefully for a week then reconciled against calendar and task list. Delta was 22%. That's more than one full working day per week I'm not billing for. The crumbs-under-every-rug description is accurate — no single gap is big enough to feel urgent, but together they add up to something that matters.
22% is on the higher end of what the data shows but not unusual for someone running multiple projects simultaneously. The multi-project context switching is where the biggest crumbs accumulate because the mental cost of switching never makes it onto any timer. One project ends, another starts, and the five minutes of reorientation just disappears. Over a week that adds up to something real. Over a year it's a number worth being angry about.