A Databricks ML project predicting U.S. domestic flight delays using PySpark and MLlib at scale across millions of flight records.
Authors
Published

April 1, 2025

Modified

April 1, 2025

Docs Website Final Presentation Repo

Flight Delay Predictions

A Databricks-based machine learning project aimed at predicting U.S. domestic flight delays by analyzing millions of flight and weather records. Using PySpark and MLlib, we applied distributed data processing and classification models at scale. Feature engineering and model training were executed using MapReduce-style operations to identify delay patterns influenced by weather, airline, and airport congestion factors. The project demonstrates how cloud-native platforms and big data tools can enable scalable, real-time predictive analytics.

Back to top

Citation

For attribution, please cite this work as:
Bakr, Mohamed, Mohamed Bakr, Erica Landreth, Danielle Yoseloff, and Shruti Gupta. 2025. “Flight Delay Predictions.” April 1. https://mohdbakr.com/projects/flight-delays/.