Phishguard

A machine-learning phishing email classifier that analyzes raw email text and returns a verdict, confidence score, and plain-English explanation of the red flags it found.

Shipped

PythonGradioXGBoostNLP

Overview

Phishguard classifies email text as phishing or legitimate using an XGBoost model trained on labeled phishing/ham corporate emails. The feature pipeline combines TF-IDF over the email body with hand-engineered signals (URL token counts, domain mismatch indicators, urgency-word frequency). XGBoost was chosen over a deep model for its strong performance on tabular features, fast inference on CPU, and the interpretability of its feature-importance output. The app is served through a Gradio UI hosted on a HuggingFace Space, which makes it trivially shareable and keeps the program free.

Challenges

Balancing false-positive cost against recall was the hardest call: a phishing classifier that cries wolf gets ignored, but one that lets attacks through defeats the point. I also had to fit the model and runtime into HuggingFace Spaces' free-tier resource budget without sacrificing latency.

What I learned

This was the first web-based application I've ever pushed into a production environment. I also learned about the nuances different AI models such as XGBoost, scikit, etc.

View on GitHub Live demo