Data Science Projects

NEXUS - Local AI Research Agent

Built a fully local AI research agent rivaling commercial tools like Perplexity Pro. Features 35+ commands, 8 expert personas, a 7-phase research pipeline with source reliability scoring, hypothesis testing, and a persistent ChromaDB knowledge base - all at zero API cost with complete data privacy.

PythonAILLMDockerChromaDBRAG

rag0 - In-Browser RAG Search Engine

A complete retrieval-augmented generation pipeline that runs entirely in the browser. Embeds queries with MiniLM via Transformers.js, performs cosine similarity vector search, and generates answers with SmolLM2 on WebGPU - zero servers, zero API keys, zero external dependencies.

JavaScriptRAGAIWebGPUNLPTransformers.js

American Ninja Warrior - Power BI Dashboard

Interactive 4-page Power BI analytics dashboard analyzing obstacle design patterns, competition structure, and geographical distribution across 10 seasons. Built with DAX measures, Bing Maps, interactive drill-downs, and KPI cards for data-driven storytelling.

Power BIDAXData VisualizationAnalytics

Data Visualization with Tableau

Interactive Tableau dashboards for business insight and executive reporting. Includes a World Bike Sales Dashboard with geographic analysis and a Sales Performance Dashboard with trend identification, KPI monitoring, and category-level drill-downs.

TableauData VisualizationBusiness IntelligenceAnalytics

Multi-Container Docker Application

Containerized full-stack application with Node.js, Express, and MongoDB orchestrated via Docker Compose. Implements production deployment with named volumes and a development workflow with bind mounts and live reload. Published to Docker Hub.

DockerNode.jsMongoDBDevOpsExpress.js

Applied Time Series Analytics & Forecasting

Forecasted complex seasonal patterns across retail, energy, and economic domains using 8+ statistical models. TBATS and Holt-Winters consistently beat baselines, proving seasonality-aware modeling outperforms naive approaches.

RForecastingTime Series

COVID-19 Mortality Prediction & Analysis

Built an end-to-end ML pipeline to predict COVID-19 mortality from global epidemiological data. Random Forest regression identified active case counts as the strongest predictor, with interpretable feature importance for policy insights.

PythonMLEDA

Mortgage Payback Analytics

Modeled payoff likelihood across 622K+ loan-month records using logistic regression and Random Forest. Found that lower LTV ratios and rising housing prices significantly increase early payoff probability.

PythonFinanceML

Software Mailing Response Analytics

Optimized direct mail targeting by ranking 50K+ customers into response-probability deciles. Logistic regression achieved AUC of 0.902, concentrating high-value responders in the top deciles for profit-first mailing.

RLogistic RegressionPCA

Smartphone Resale Price Prediction

Predicted resale prices for 3,400+ used devices using regression and classified them into pricing tiers. MLR achieved R-squared of 0.78; logistic regression hit 86% accuracy for High/Low tier classification.

RRegressionClassification

Predictive Viability Check (Pre-Modeling Feasibility)

A pre-modeling framework that catches data quality issues, target leakage, and distribution drift before any model is built. Prevents wasted effort on unreliable datasets by providing a structured go/no-go decision.

PythonEDAData QualityLeakageDriftGovernance

My Projects

Data Science Portfolio