Challenge
Chemical formulation database at Univar Solutions needed cleanup, but manual edits lost connection to original IDs, making it impossible to write changes back.
Solution
Data Pipeline
- Built reconciliation system matching outdated names to original IDs
- Enabled bidirectional sync between golden dataset and production DB
- Automated cleanup and consolidation of redundant labels
Team Enablement
- Introduced Jupyter notebooks to annotation team
- Created tools for efficient data cleanup
- Trained team on data engineering best practices
Impact
- Saved hundreds of hours of manual work
- Made previously impossible database updates feasible
- Improved data quality for Univar Solutions chemical supply network
Technologies
- Data engineering and ETL
- Database reconciliation algorithms
- Jupyter for collaborative data work
Contract role through Potion, providing services to Univar Solutions