Welcome! Please scroll down to have a look at my resume.
Santander Customer Satisfaction
Santander Bank has provided information collected throughout the customer journey and the associated end of journey outcome of customer satisfaction.
The bank has anonymized the features provided. The task is to identify high impact predictors which will be of great use for the bank.
• Analysed a highly imbalanced dataset (majority customer satisfied) with 370 anonymized predictors and implemented
Gradient Boosting, Random Forest, Decision Trees to predict customer satisfaction with bank’s services accurately.
• Applying permutation importance on the best model identified the features that are most important for prediction.
Santander Bank Ltd can use these and focus more on strong areas & improve weaker areas.
• Best results were obtained with gradient boosted tree model which gave an AUC score of 0.9692.
Sklearn, Gradient Boosting, Random Forest, Bagging, Decision Tree, Logistic Regression, kNN classifier
Fault Detection using Hadoop
The roasting machine of concern is an aggregate consisting of 5 chambers of equal size, each chamber has 3 temperature sensors. Raw materials pass through the
kiln in an hour. The goal is to build a model that, on the basis of data arriving every minute, determines the quality of products produced on a roasting machine
and comparing it with the actual quality obtained to detect fault in the roasting machine.
• Created a data pipeline to HDFS using MapReduce to store the cleaned processed data and used PySpark to access it.
• Built an ANN with Tensorflow to detect fault in a roasting machine by predicting required quality and comparing with obtained quality. The model predicted fault with 96% accuracy.
Hadoop architecture, HDFS, MapReduce, PySpark, Big Data fundamentals
Happy Whale: Whale and Dolphin Classification
HappyWhale engages citizen scientists to identify individual marine mammals, for fun and for science by using Photo ID technique. A dataset of whales & dolphins
is provided, and the task is to classify them into 15,000 unique individual marine mammals from 30 different species.
• Implemented transfer learning using SOTA models, VGG16 and EffiecientNetB7, to classify whales and dolphins into 30 species and 15587 unique individual ids on a dataset of 50k images (62.8 gb of data).
• Integrated ArcFace to capture fin patterns to detect individual ids more accurately. Accuracy improved by around 60% as compared to previous model.
Tensorflow, ANN, CNN, CNN visualization, Autoencoder, Transfer learning
ensAIgnant | Debate God
The application will be used by students between 6th and 12th grade to improve argumentative writing. The effectiveness of the arguments is to be predicted. This feedback will be given to students to improvise the arguments.
• Developed a language model that can process the arguments put forth by students and judge the effectiveness of the argument.
• Implemented BERT, Deberta V3 models to classify the arguments as Ineffective, Adequate and Effective.
• Deployed the containerized Debate God app on GCP.
• Containerized the application using Docker and deployed on Kubernetes cluster through Ansible.
GCP, Kubernetes, ML Ops, Ansible, Docker, Word2Vec, RNN, LSTM, Seq2Seq modelling, BERT, HuggingFace api
Sardar Patel College of Engineering Mumbai, India
B.Tech – Mechanical (CGPA 9.65/10) July 2018 - June 2022
MIT Junior College Pune, India
12th HSC Board (Distinction) July 2016 - Feb 2018
Sinhgad Spring Dale Public School Pune, India
10th CBSE (CGPA 10/10) June 2016