1
0 Comments

Run an Apache Airflow DAG with Docker Compose and PostgreSQL

A hands-on guide to deploying a production-grade document ingestion pipeline using Apache Airflow, FastAPI, Docker Compose, and PostgreSQL. Covers the full runtime setup: spinning up 5 Docker containers, uploading PDFs via a FastAPI endpoint, triggering and monitoring Airflow DAGs, and verifying parsed chunks in PostgreSQL. Includes code for PDF parsing with PyPDF, sliding-window text chunking, SHA-256 content hashing for deduplication, and an idempotent PostgreSQL init script. Also discusses error handling for corrupted files, key design principles (idempotency, observability, reproducibility, data provenance), and the practical limits of Airflow for GPU-heavy ML workloads like embedding generation.

on June 8, 2026
Trending on Indie Hackers
Hi IH — quick update. The MVP is live. User Avatar 33 comments Building ExpenseSpy solo, no funding — launching June 17 on iOS & Android User Avatar 26 comments Day 7: 51 people answered my question. I wasn't ready for what they said. User Avatar 18 comments I Built a Football Sentiment Platform in 18 Days. The World Cup Starts in 7 Days. Now I Need Distribution. User Avatar 17 comments Built an n8n booking alert system — is cold outreach dead for B2B micro-tools? User Avatar 16 comments I built a $5/1k-listing CRE data API because CoStar is overkill for first-pass scans User Avatar 14 comments