August 12, 2019

Looking for ideas on building a "custom search engine"

I'm currently writing a book and this requires lots of research. As of now, I've bookmarked around 150-200~ websites that are perfect for my industry/niche and researching needs. I'd love the ability to search for various keywords but only find results from these sites. It seems Google CSE is limited results-wise and when I tried I wasn't having much luck regardless. Is there any other solutions out there that I could potentially use or even bootstrap together? I was even thinking I could go as far as mass scraping these websites into a local copy and creating some sort of internal search engine but that seems quite intense although it could be far more affordable because I feel like any crawling services are costly.

  1. 1

    I'm actually building https://webchronicler.io. Its currently in beta and pretty much solves this use case. It indexes your browsing history and lets you search with any keyword you remember including pdfs. Does some automatic categorization and you can also add notes and tags.

    Not sure what format your current bookmarks are in, if you happen to find the tool useful, you can just shoot me a mail at [email protected] and I can figure out a way to import your bookmarks.

    1. 1

      HI Jaymu,

      Just installed and will be testing your app, nice work!

      Will keep you posted.

      Chris

      1. 1

        Let me know if you have any feedback!

        1. 1

          You should filter out sites like Spotify that seems to trigger page history (because song name changes the page title). I leave it running all day when coding so I am getting swamped with song titles. Maybe grouping them together ?

          1. 1

            Thanks for checking it out @chrisribe. You can actually block sites based on any pattern. But thats an interesting idea to group similar pages based on title/content/website, it would definately make scanning your history a lot easier.

  2. 1

    You can try https://www.algolia.com I have heard the positive things about them but I am not sure your use case would be easily possible.