✨ Open to new grad 2026
Data Scientist & ML Engineer focused on engineering high-fidelity pipelines, optimizing retrieval, and fine-tuning domain-specific models. Moving past the hype to treat AI as a data problem, I build scalable NLP and RAG systems designed to deliver in production, backed by practical engineering experience at global institutions.
Fine-tuning BioBERT per disease for clinical NER on 40,000 notes. Knowledge graph linking diseases, symptoms, and treatments for RAG-based retrieval.
Processed Fannie Mae data in Parquet with Spark. Clustering, regression, and risk analysis on corporate filings using PySpark and Neo4j GraphRAG.
Content-based image retrieval for 12K artworks using SimCLR. FAISS nearest-neighbor search achieving 96% top-10 similarity accuracy.
CNN, CRNN, and Vision Transformer models classifying animal sounds from Mel-spectrograms. Tuned CNN+Dropout outperformed YAMNet — 92% test accuracy.
Quantified receiver separation, reaction time, and coverage efficiency from frame-level tracking data. K-Means clustering for defensive movement archetypes.
Leveraged YouTube API telemetry to identify a monetization sweet spot and uncover a growth opportunity through CTR optimization across channels.
Built multilingual NLP systems (RoBERTa) and automated legal text workflows on Azure. Integrated Gemini, OpenAI, and Hugging Face APIs to accelerate document analysis. Developed ML prototypes and Tableau dashboards supporting global policy research.
Deployed multimodal AI systems (YOLO, BLIP, PaddleOCR) via FastAPI. Fine-tuned LLaMA 2, Phi-2, and BART for personalized content generation and task automation on AWS.
Built fraud detection, demand forecasting, and customer segmentation models. Automated ETL pipelines on Alibaba Cloud and built dashboards for strategy and marketplace teams.
Scraped 15K+ rural hospital records and built ETL pipelines to surface supply chain bottlenecks. Developed readmission risk models and health dashboards for policy decisions.