Work

Projects, tradeoffs included.

Every project below lists what was traded away and what didn't work - because real engineering is choosing constraints, and pretending otherwise helps no one.

NCAA Analytics Challenge

Machine Learning · Gradient Boosting · Feature Engineering · Tableau

github ↗linkedin ↗

Won the NCAA Final Four Analytics Challenge - Predicting seeds for 2026 NCAA March Madness.

Predicted NCAA Tournament seedings for over 360 college basketball teams using five seasons of historical data. The real challenge was reverse-engineering how the selection committee weighs NET rankings, quadrant records, and conference strength. We built a seven-model gradient-boosting ensemble over 104 engineered features that reached 78% accuracy, cutting prediction error by 43% versus the baseline. I then used Tableau dashboards to turn the findings into a clear narrative for NCAA stakeholders.

Honest notes - what was traded away

–Minimizing RMSE on seeds sounds clean on paper, but committee logic is inconsistent - most of the work was iterative error analysis to find where the model was systematically wrong, then encoding those patterns as features.
–The Tableau narrative ended up mattering as much as the model when presenting to judges - a lesson in how far accuracy alone gets you.

Yanck

LangChain · Google Gemini · RAG · Flask

github ↗

No-code RAG chatbot platform for teams with zero ML engineers.

Created a platform that lets non-technical users create and deploy AI assistants on their own data through a guided workflow: upload documents, generate embeddings, and serve responses through Google Gemini. The hardest part was making integration painless for whoever handled it on the client's side, so we shipped three deployment paths: an embeddable JS widget, an iframe, and a REST API with key-based auth.

Honest notes - what was traded away

–Chose a guided linear workflow over a flexible node editor - less powerful, but non-technical users finished setup instead of abandoning it.
–Retrieval quality depends heavily on how users chunk their uploads; automatic chunking heuristics are good enough, not great.

SOFI 2035

Data Visualization · Plotly · Python · Data Pre-processing

live ↗github ↗

Interactive global-futures dashboard built for The Millennium Project.

Built an interactive dashboard that lets users explore global development trends across economic, social, environmental, governance, and technology indicators through dynamic visualizations and scenario analysis. We built it for a real client, The Millennium Project.

Honest notes - what was traded away

–Server-rendered Plotly charts kept development fast but make first load heavier than a hand-rolled D3 build would be.
–Scenario analysis is precomputed rather than live - simpler and more reliable on free-tier hosting, at the cost of interactivity.

Healthcare Data Pipeline

Airflow · PySpark · AWS S3 · Tableau · FHIR

github ↗

ETL pipeline standardizing patient and device data, with live monitoring dashboards.

I engineeredan end-to-end pipeline using Spark and Airflow to standardize FHIR patient and device data landing in AWS S3. I connected Tableau dashboards to live MySQL pipelines so the team could monitor throughput and utilization metrics in real time.

Honest notes - what was traded away

–Spark is overkill for the demo data volume; it was used deliberately to exercise the same tooling that production-scale volumes need.
–Live MySQL-backed dashboards demo well but need connection pooling and caching before they'd survive real concurrent load.

AI Expense Tracker

FastAPI · BERT · Qdrant · Time Series · Power BI

github ↗

Voice-driven expense tracking with semantic categorization at 92% accuracy.

I built a FastAPI backend for an AI-powered expense tracker that supports voice-based transaction processing and real-time categorization. Categorized expenses semantically using BERT embeddings with Qdrant vector search, reaching 92% classification accuracy, and added Power BI dashboards for time-series spending forecasts. The project evolved from Yafa, my GeeksforGeeks Hackathon winner.

Honest notes - what was traded away

–Voice-to-text accuracy drops with background noise and accents; a confirmation step was added rather than chasing model accuracy.
–Embedding-based categorization beats rules at 92%, but the last 8% is ambiguous even to humans - a category-correction flow mattered more than model tuning.

Optical Music Recognition

Computer Vision · OpenCV · Python · Image Processing

Computer vision system that digitizes sheet music from images.

Implemented a computer vision system that detects and digitizes music notes from images of sheet music. It uses image processing and pattern recognition to convert physical sheets into editable digital formats.

Honest notes - what was traded away

–Classical template matching over a learned model - explainable and training-data-free, but it degrades on handwritten or low-quality scans.