3
0 Comments

Idea Validation: Data Extraction Tool using NLP

Background

I was approached by a doctor who is specializing in infectious diseases to extract Covid-19 patient data from local government bulletins. These are daily press releases in PDF format with a bunch of tables and some text. Source files. I couldn't find any no-code ready to use tools and was experimenting with the NLP library SpaCy to see if it can be automated.

After a couple of online tutorials later, I was able to automate it to a certain degree. Sample as shown in the image above.

Product Idea

  1. Upload files containing the text
  2. Choose a division of labor - (sentences, paragraphs or pages...etc.,)
  3. Tag the data on a sample (say 5 entries)
  4. Run the NLP process in the background and generate a table of the extracted information. Surely, it won't be 100% accurate.
  5. Now the user can click at the rows where the information is wrong which will show the full text and allow the user to tag the right bits
  6. Each correction will update the data table by rerunning the NLP process using the newly received inputs.

Kindly provide your feedback.

posted to Icon for group Ideas and Validation
Ideas and Validation
on November 25, 2020
Trending on Indie Hackers
I built a text-to-video AI in 30 days. User Avatar 67 comments What 300 Builders Taught Us at BTS About the Future of App Building User Avatar 52 comments I built something that helps founders turn user clicks into real change 🌱✨ User Avatar 50 comments From a personal problem to a $1K MRR SaaS tool User Avatar 47 comments This Week in AI: The Gap Is Getting Clearer User Avatar 30 comments How An Accident Turned Into A Product We’re Launching Today User Avatar 29 comments