Skip to content

Tarpit59/llm-insurance-autofill

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧾 Insurance Template Filler – Web App

This web app allows users to upload insurance photo report PDFs and a .docx template. The system uses OCR + AI to extract relevant information and automatically fills the template. The final result can be downloaded as a filled PDF or viewed directly in the browser.

📁 Project Structure

.
├── insurance_pipeline/     # Core pipeline (OCR, extraction, LLMs, etc.)
├── sample/                 # Sample input/output files
├── app.py                  # Streamlit app for UI interaction
├── .env                    # API keys
├── requirements.txt        # Dependencies list
└── README.md               # Project documentation

🚀 Setup Instructions

  1. Create & Activate Virtual Environment:
python3.9 -m venv task_3
source task_3/bin/activate   # macOS/Linux
task_3\Scripts\activate      # Windows
  1. Install PaddleOCR:

If you have a GPU and CUDA 11.8:

python -m pip install paddlepaddle-gpu==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/

If not, use the CPU version:

pip install paddlepaddle
  1. Install Other Dependencies:
pip install -r requirements.txt
  1. Add API Keys to .env File:
  • Make sure your .env file includes:
OPENROUTER_API_KEY = "openrouter_api_key"
GOOGLE_API_KEY = "google_api_key"
PINECONE_API_KEY = "pinecone_api_key"
COHERE_API_KEY = "cohere_api_key"
GROQ_API_KEY = "groq_api_key"
CONVERTAPI_API_KEY = "convertapi_api_key"
  1. Run the Application:
streamlit run app.py
  • A local server will start and open the app in your default browser.

🧠 Pipeline Overview

┌────────────────────────────┐
│        Upload Inputs       │
│ ┌────────────────────────┐ │
│ │       Report PDFs      │ │
│ │     .docx Template     │ │
│ └────────────────────────┘ │
└────────────┬───────────────┘
             │
             ▼
┌────────────────────────────┐
│    OCR + Text Chunking     │
│ - OCR PDFs                 │
│ - Split into text chunks   │
└────────────┬───────────────┘
             │
             ▼
┌────────────────────────────┐
│  Embedding + Pinecone DB   │
│ - Convert chunks to vectors│
│ - Store in Pinecone index  │
└────────────┬───────────────┘
             │
             ▼
┌──────────────────────────────────────┐
│   Field Meaning Extraction (LLM)     │
│ - Extract placeholders from .docx    │
│ - Understand meaning (OpenRouter LLM)│
└────────────┬─────────────────────────┘
             │
             ▼
┌──────────────────────────────────────┐
│     Semantic Retrieval + QA          │
│ - Similarity search (Pinecone)       │
│ - Rerank with Cohere                 │
│ - Final answer via GROQ LLM          │
└────────────┬─────────────────────────┘
             │
             ▼
┌────────────────────────────┐
│    Fill Template Fields    │
│ - Replace placeholders     │
└────────────┬───────────────┘
             │
             ▼
┌────────────────────────────┐
│      Convert to PDF        │
│ - Use ConvertAPI           │
└────────────┬───────────────┘
             │
             ▼
┌────────────────────────────┐
│    Preview & Download PDF  │
│ - View PDF in browser      │
│ - Download final PDF       │
└────────────────────────────┘

⏱️ Performance Note

To manage LLM API usage and rate limits, a delay is added between field queries. You can modify this in: insurance_pipeline/qa_utils.py

  • insurance_pipeline/qa_utils.py : Modify in this file.
def extract_all_fields(...):
    ...
    time.sleep(5)  # Delay between LLM requests

📸 Sample Files

  • You can find sample .docx templates and insurance report PDFs in the sample/ directory for testing.

🙏 Acknowledgements

About

AI-powered Python tool using LLMs to automate insurance form autofill from unstructured documents. Boosts efficiency for claims, underwriting, and policy processing.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages