I am trying to make a trending feature for my ios app for hashtags.
So I am have conceived of a way to provide a score to certain data in my DB based on its level of engagement. But I can't think of a way to make it a trend. For example, do I have to clear it after X amount of time? Or do I maybe create batches of hashtags per week and continually switch?
Any ideas?
I'm working with Swift and firebase realtime DB if that makes a difference
On a past project I used the following formula, inspired from HN and other literature I found on the subject. It's quite simple but did the job for a while.
ranking = (numberOfComments * commentsWeight + numberOfLikes * likesWeight) / (daysSincePost * gravity)
gravity was 0.2
commentsWeight 2
likesWeight 1
So my thinking is this ranking will b used as a child to order by when fetching, as a result this must mean I am constantly performing the calculation on users phones when they like or comment. Is this correct?
Yes you would order by the ranking in DESC order.
If I remember well this was computed server side every hour by a Cron.
However, it will depend on the volume. Computing it on client side means that you need to fetch everything first. It can be a huge amount of data.
A Cron means that the volume of content increasing it will take more and more time to run.
A possibility would also be to use a Cron, and also update the ranking each time the content is updated (new like, new comment, etc)
I'd suggest looking into HN or Reddit algorithms. They both take into account time + popularity.
Bear in mind it's not a simple problem!
+1 for this
If you store votes separately from the main thing that people vote on, you could probably count the votes from a past time frame (eg. day, or 3 hours) and order them in descending order. This way, the highest voted thing in the last few hours becomes trending.
Or maybe you can implement this with unique viewer count (store an unique identifier, best if anonymized, of a user that visited a certain item and spent more than X time there). You can expire such data after a while because you'll only need to keep the most recent data to apply the trending algorithm on it.
The weekly batches also work, it would reduce the load on the database, because the trending algorithm needs to scan the whole database (or a significant amount of the data) to determine what is trending and what not.
I think you could save specific data in DB such as creation date and number of votes and a "score" number based on those two. You could have a trigger that automatically updates the score whenever the votes change, or do it at an interval.
I think your question has to be a lot more specific. What part of the trending algorithm are you having trouble implementing? The scoring? The updating? Selecting top X items?
I am having trouble with the "window" part. How do I make it trending and not an all-time most active?
How often would you like it to be updated?
If you have a query, you could just do
SELECT * from posts WHERE post_date > (NOW() - INTERVAL 24 HOUR) ORDER by VOTES DESC
, and if it is an expensive query you could cache the result for a few hours or days.This should work fine unless your site has millions of posts that are frequently updated.
I don't know your stack but this is usually done with sliding windows.
Example: https://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending-topics-in-storm/