Callum Dourneen
Software Developer
< Back
project prem
A Python-based machine learning model that forecasts match outcomes.
About
The Premier League Predictor is a data driven tool built to forecast football match outcomes using machine learning. The project gathers and processes historical Premier League data, team statistics, and form metrics to train a classification model capable of predicting match results. It was designed to explore the application of artificial intelligence in sports analytics and showcases skills in data preprocessing, model training, evaluation, and deployment. The predictor provides a practical example of how AI can interpret and act on real world data.
Project Details
App Title: Premier League Predictor
Genre: Data Science / Sports Analytics
Platform: PC
Development Tools: Python, Pandas, Scikit-learn
App Features:
  • Preprocess and analyze historical Premier League data
  • Train a classification model to predict match outcomes
  • Evaluate model accuracy and refine predictions
  • Visualize team performance trends and prediction results
  • Explore potential real-world applications of AI in sports
  • Technical Analysis
    What I Learnt
    Applying Machine Learning:
    This project was my first experience using machine learning to create an AI system that could make predictions based on real world data. I learned how to structure and clean a dataset, train a model using logistic regression, and interpret prediction results.
    Retrospective
    Future Improvements:
    Looking back, this project gave me a strong foundation in both machine learning and dataset engineering. One of the biggest challenges was preparing data that the model could learn from accurately—especially distinguishing between home and away teams. Initially, the AI couldn't recognize the difference, so I had to explicitly encode team roles with one hot encoding. I also learned the importance of input validation in the GUI to prevent model errors from incorrect user selections. In the future, I would like to expand the dataset and explore more advanced models.