This is story of how a simple oneline hack bring 100-1000x performance from user perspective.
When an user add a domain into our system, we immediately perform a DNS query for MX records to see if they point their DNS to mx1.hanami.run and mx2.hanami.run. Besides that we have a job that run every 2 minutes to find domains that is missing DNS and check again.
We were at a point where Cloudflare rate limit it due to our heavy usage - for every incoming connection we check the reverse dns and a bunch of DNS query for spam filtering -, so we bump that to 5 mins to make it do less work.
That's the first issue with thing become slower. But within 5mins window, it should auto refresh. Many users has MX records TTL set to 5mins anyway, so it isn't that bad.
Then a few user told us it take forever for the DNS to refresh, and they have to refresh themselves. We have a button say 'Recheck DNS" .
We chime in and indeed, it's was very slow.
Our code is very simple
Domain.find_each do |domain|
update_dns domain unless domain.dns_ready?
rescue StandardError => e
Bugsnag.notify e
end
This essentially iterate the domains in batch, in asc order, and attempt dns. the issue is right there. "in asc" order. We have so many domains now, and apparently a lot of them was missing DNS (a lot of users own many domains) so the system were checking the oldest one first.
Our online fix is
Domain.find_each(order: :desc) do |domain|
update_dns domain unless domain.dns_ready?
rescue StandardError => e
Bugsnag.notify e
end
So that it check the newst one first and it works suprisingly well.
The lesson to me is that, though we didn't improve anything in general, we did make user the impression that the system is faster. As a solo-founder, I think this is a great hack so I can focus on other component of system.
Until next time we will write about how we make this really faster, for real.
If you need email forwarding, give us a try at https://hanami.run