3
0 Comments

Idea Validation: Data Extraction Tool using NLP

Background

I was approached by a doctor who is specializing in infectious diseases to extract Covid-19 patient data from local government bulletins. These are daily press releases in PDF format with a bunch of tables and some text. Source files. I couldn't find any no-code ready to use tools and was experimenting with the NLP library SpaCy to see if it can be automated.

After a couple of online tutorials later, I was able to automate it to a certain degree. Sample as shown in the image above.

Product Idea

  1. Upload files containing the text
  2. Choose a division of labor - (sentences, paragraphs or pages...etc.,)
  3. Tag the data on a sample (say 5 entries)
  4. Run the NLP process in the background and generate a table of the extracted information. Surely, it won't be 100% accurate.
  5. Now the user can click at the rows where the information is wrong which will show the full text and allow the user to tag the right bits
  6. Each correction will update the data table by rerunning the NLP process using the newly received inputs.

Kindly provide your feedback.

posted to Icon for group Ideas and Validation
Ideas and Validation
on November 25, 2020
Trending on Indie Hackers
From building client websites to launching my own SaaS — and why I stopped trusting GA4! User Avatar 37 comments The “Open → Do → Close” rule changed how I build tools User Avatar 27 comments I lost €50K to non-paying clients... so I built an AI contract tool. Now at 300 users, 0 MRR. User Avatar 21 comments Everyone is Using AI for Vibe Coding, but What You Really Need is Vibe UX User Avatar 21 comments Learning Rails at 48: Three Weeks from Product Owner to Solo Founder User Avatar 19 comments If you are About to Quit, Read This. User Avatar 10 comments