Run an Apache Airflow DAG with Docker Compose and PostgreSQL

A hands-on guide to deploying a production-grade document ingestion pipeline using Apache Airflow, FastAPI, Docker Compose, and PostgreSQL. Covers the full runtime setup: spinning up 5 Docker containers, uploading PDFs via a FastAPI endpoint, triggering and monitoring Airflow DAGs, and verifying parsed chunks in PostgreSQL. Includes code for PDF parsing with PyPDF, sliding-window text chunking, SHA-256 content hashing for deduplication, and an idempotent PostgreSQL init script. Also discusses error handling for corrupted files, key design principles (idempotency, observability, reproducibility, data provenance), and the practical limits of Airflow for GPU-heavy ML workloads like embedding generation.