3
0 Comments

Idea Validation: Data Extraction Tool using NLP

Background

I was approached by a doctor who is specializing in infectious diseases to extract Covid-19 patient data from local government bulletins. These are daily press releases in PDF format with a bunch of tables and some text. Source files. I couldn't find any no-code ready to use tools and was experimenting with the NLP library SpaCy to see if it can be automated.

After a couple of online tutorials later, I was able to automate it to a certain degree. Sample as shown in the image above.

Product Idea

  1. Upload files containing the text
  2. Choose a division of labor - (sentences, paragraphs or pages...etc.,)
  3. Tag the data on a sample (say 5 entries)
  4. Run the NLP process in the background and generate a table of the extracted information. Surely, it won't be 100% accurate.
  5. Now the user can click at the rows where the information is wrong which will show the full text and allow the user to tag the right bits
  6. Each correction will update the data table by rerunning the NLP process using the newly received inputs.

Kindly provide your feedback.

posted to Icon for group Ideas and Validation
Ideas and Validation
on November 25, 2020
Trending on Indie Hackers
How are you handling memory and context across AI tools? User Avatar 110 comments Do you actually own what you build? User Avatar 66 comments Code is Cheap, but Scaling AI MVPs is Hard. Let’s Fix Yours. User Avatar 34 comments How to see your entire business on one page User Avatar 30 comments I Think MCP Will Punish Thin API Wrappers User Avatar 27 comments What AI Is Actually Changing in IT Certification Prep User Avatar 19 comments