An ETL Pipeline populating a PostgreSQL database optimised for OLAP.
- Integrated geospatial data to explore species distributions within park boundaries using PostGIS.
- Extracted, cleaned, and normalised species data with category-specific transformation strategies (e.g., birds, mammals, reptiles, etc.).
- Resolved duplicates, typographical errors, and inconsistencies, ensuring reliable taxonomic data for each species.
- Generated detailed logs, backups, and manual review files to ensure traceability and integrity across all transformation stages.
- Designed the pipeline with modularity, enabling easy extension for new species categories and enhancements in data processing.
Python
SQL
ETL
OOP
Geospatial Analysis