I build end-to-end ML pipelines — from raw data to deployed models — with a focus on explainability, rigorous evaluation, and real-world impact.
Deep understanding of supervised & unsupervised learning, optimization theory, and model selection — not just library calls.
Hands-on experience with SMOTE, class weighting, and precision-recall tradeoffs on real-world skewed datasets like fraud detection.
From raw EDA → feature engineering → model training → evaluation → Streamlit deployment. No gaps in the workflow.
Every project includes problem framing, dataset context, engineering decisions, metric justification, and future roadmap.
Trained on Azure ML pipelines, MLflow experiment tracking, and Hugging Face model hubs through the DEPI Microsoft specialization.
Projects spanning NLP, Computer Vision, time-series anomaly detection, and structured tabular ML — versatile across problem types.
I'm Marwan Amir, an aspiring AI Engineer and Data Scientist currently pursuing a Bachelor of Science in Computer Science with an Artificial Intelligence track at Cairo Higher Institution, maintaining a 3.5 GPA.
My ML journey began with a fundamental question: how do machines extract signal from noise? That curiosity evolved into a disciplined engineering practice — I don't just train models, I study why they work, where they fail, and how to make them production-ready.
At NTI (National Telecom Institution), I developed hands-on proficiency in the full ML lifecycle — preprocessing pipelines, model evaluation, hyperparameter tuning, and applying data-driven decision making to real datasets. I'm simultaneously completing the DEPI Microsoft ML Engineer Specialization, focusing on Azure AI, MLflow, and deploying models at scale.
Long-term, I want to contribute to teams building ML systems that are not just accurate, but robust, interpretable, and genuinely useful — in domains ranging from healthcare and finance to computer vision applications.
Financial fraud causes billions in global losses annually. The core challenge: fraud events are <0.2% of transactions, making naive classifiers useless — they achieve 99% accuracy by predicting "not fraud" for everything.
284,807 transactions with 30 anonymized PCA features + Amount + Time. Class imbalance ratio ≈ 577:1. Standard Kaggle credit card fraud dataset.
Standard-scaled Amount and Time features. Applied SMOTE oversampling on training set only (no data leakage). Correlation heatmaps to identify key discriminating features.
Logistic Regression (baseline) → Random Forest → XGBoost. Threshold tuning to optimize F1-score and minimize false negatives (frauds missed = real cost).
Accuracy is a misleading metric for imbalanced problems. Precision-Recall curves and F2-score (weighted toward recall) are far more actionable for fraud use-cases where false negatives carry financial and legal consequences.
Retaining an existing customer is 5–7× cheaper than acquiring a new one. Predicting which telecom customers will churn enables proactive retention campaigns — high business ROI if the model is both accurate and explainable.
IBM Telco Customer Churn dataset: 7,043 customers, 20 features including contract type, tenure, monthly charges, internet service, and payment method. ~26.5% churn rate.
Encoded categorical variables (OHE + label encoding), engineered TotalCharges / tenure ratio, binned tenure into cohorts, and handled missing values in TotalCharges via median imputation.
Logistic Regression → Decision Tree → Random Forest → Gradient Boosting. GridSearchCV for hyperparameter optimization. SHAP values for feature importance and business-level interpretability.
Month-to-month contracts and high monthly charges with low tenure are the strongest churn predictors. SHAP waterfall plots communicated these drivers to non-technical stakeholders clearly, enabling targeted retention offers.
Early-stage breast cancer detection dramatically increases survival rates. The goal: build a high-recall classifier on clinical cell measurements to identify malignant tumors — where a false negative means a missed diagnosis.
Wisconsin Breast Cancer Dataset (WBCD): 569 samples, 30 features (mean, SE, worst for 10 cell-nucleus measurements). 212 malignant / 357 benign. Well-studied benchmark.
StandardScaler normalization across all 30 features. PCA dimensionality reduction to 10 components (95% variance retained). Correlation analysis to remove highly collinear features before SVM training.
KNN → Logistic Regression → SVM (RBF kernel). 10-fold stratified cross-validation for robust evaluation. Precision-Recall curves to select decision threshold maximizing recall on malignant class.
In medical diagnosis, recall is the primary metric — a false positive (unnecessary biopsy) is far less harmful than a false negative (missed cancer). Tuning the classification threshold from 0.5 → 0.35 increased recall by 4.2% with minimal precision loss.
Real estate pricing is notoriously opaque. A robust regression model gives buyers, sellers, and agents a data-driven price anchor — reducing information asymmetry and improving market efficiency.
Ames Housing Dataset: 1,460 training samples, 79 features covering lot area, neighborhood, build year, quality ratings, basement specs, garage info, and sale conditions.
Log-transformed skewed features (SalePrice, LotArea), ordinal encoding for quality grades, engineered TotalSF = TotalBsmtSF + 1stFlrSF + 2ndFlrSF, filled 19 missing-value columns using domain-appropriate strategies.
Linear Regression → Ridge/Lasso → Random Forest → XGBoost. Evaluated using RMSE, MAE, and R². Lasso used for automatic feature selection; XGBoost for final performance.
Overall quality rating and total square footage account for ~60% of explained variance. Log-transforming SalePrice reduced the impact of luxury outliers and significantly improved linear model performance — a reminder that distribution assumptions matter.
Manual video surveillance is infeasible at scale. Automated anomaly detection — identifying abnormal pedestrian behavior like running, fighting, or crowd surges — enables real-time public safety alerting.
UCSD Pedestrian Dataset: two subsets (Ped1/Ped2) of surveillance video frames with annotated anomalous events. Frames extracted and preprocessed into temporal sequences.
CNN-based spatial feature extraction from individual frames. Stacked frame sequences for temporal modeling. Reconstruction error thresholding for anomaly scoring.
Convolutional Autoencoder trained on normal-only sequences. Anomaly score = frame reconstruction error. Events exceeding a calibrated threshold flagged as anomalous.
Unsupervised anomaly detection via reconstruction error is powerful but sensitive to threshold calibration. Training exclusively on normal behavior forces the autoencoder to encode the "normal manifold" — making abnormal events high-reconstruction-error outliers by design.
Social media sentiment is a real-time pulse of public opinion. Accurately classifying tweet sentiment (positive/negative/neutral) has applications in brand monitoring, crisis detection, and financial signal generation.
Twitter sentiment dataset with labeled tweets. Preprocessing involved handling hashtags, mentions, URLs, emojis, and informal language — significantly noisier than formal text corpora.
Text cleaning pipeline: lowercasing, stopword removal, tokenization. Word-to-index mapping with padding for fixed-length sequences. Embedding layer trained end-to-end with the LSTM.
Bidirectional LSTM with dropout regularization. Embedding layer → BiLSTM → Dense → Sigmoid. Compared against TF-IDF + Logistic Regression baseline to quantify deep learning uplift.
Twitter's informal language (slang, sarcasm, abbreviations) makes preprocessing critical. Bidirectional LSTMs outperform unidirectional by capturing context from both directions — especially valuable for short, context-dependent tweet language.
I'm actively seeking ML engineering roles, data science internships, and research collaborations. If you're building something ambitious and need someone who takes both the math and the engineering seriously — let's talk.