Key metrics that define my data science journey
Interactive charts powered by Chart.js
Real dashboards built with Power BI and Tableau
Interactive dashboard embed coming soon. In the meantime, explore my published dashboards below.
Client-side linear regression with pre-computed coefficients
| Factor | Impact |
|---|---|
| Click "Predict Price" to see results | |
Adjust the classification threshold and watch metrics change in real-time
Click on the canvas to place data points, then run K-Means to see clustering in action
Watch a machine learning algorithm learn simple rules to predict whether to play tennis
| Outlook | Temp | Humidity | Wind | Play? |
|---|
Analyze ML model fairness across demographic groups in a loan approval scenario
Click to add data points, then see how different regression models fit the data
Build a network, pick a dataset, and watch it learn decision boundaries in real-time
Type any text and watch NLP break it down word by word
Watch how gradient descent navigates a loss surface to find the minimum
Enter your experiment data and determine statistical significance
Simulate investment portfolio returns using random walks
See how prior beliefs update with evidence using Beta-Binomial conjugacy
Decompose airline passenger data into trend, seasonal, and residual components
Explore variable relationships visually -- click any cell to see the scatter plot
Scroll through a real ML project from problem to result
A software company was spending heavily on direct mail campaigns but seeing diminishing returns. Only 5.1% of recipients were responding, wasting budget on uninterested prospects. The goal: build a model to predict who will respond, maximizing ROI.
Collected 9,517 customer records with 41 features including demographics, purchase history, web activity, and prior campaign responses. Cleaned missing values, handled outliers, and performed stratified sampling to maintain class balance.
Applied PCA to reduce 41 features to 12 principal components capturing 87% of variance. Created interaction terms, applied log transforms to skewed distributions, and used clustering to segment customers into 4 behavioral groups.
Compared logistic regression, random forest, and gradient boosting. Tuned hyperparameters with 5-fold cross-validation. Logistic regression with PCA features delivered the best balance of interpretability and performance. Used decile ranking for targeting.
Achieved 0.902 AUC, meaning the model correctly ranks responders above non-responders 90.2% of the time. The top 2 deciles captured 62% of all responders, allowing the company to cut mailing costs by 80% while retaining most conversions.
Try SQL queries against my portfolio database
Watch data flow through a real-world ETL pipeline with live processing metrics