Professional Projects

Reconciliation Projects

• Analyzed the breaks between the trading system and the data repository in python (pandas, numpy, and seaborn) and created logic in XCeptor to classify breaks as No Matches or Mismatches

• Automated daily reports for data analysts to ensure account data is met internal data quality, business, and regulatory requirements and standards

• Showed the trends between different systems based on the break reason and the day of the week which allows upper management to track the entire reconciliation process

Personal Projects

Prediction of Employee Promotion

• Built a Decision Tree classification model to predict whether the employee should get a promotion or not with the accuracy of 89% in python (pandas, o seaborn, matplotlib, numpy, scikit-learn, pickle, and SMOTE)

• Performed cross-validation and optimized models to reduce misclassification errors to less than 11%


• Performed data imputation, data scaling, and normalization to handle over 50K historical employee records


Monthly KPI and KRI Reports

Automated monthly reporting for senior management to increase visibility on team performance and reduce business response time for strategic decision making by identifying key indeces and transforming raw data to actionable insights using python (fnmatch, pandas, numpy, datetime, and os), for non-technical business partners

Restaurant Review Analysis

• Created a Logistic Regression model that can classify a Restaurant Review as a Positive or a Negative review with accuracy of 77% in python (pandas, numpy, nltk, opendatasets, seaborn, matplotlib, wordcloud, collections, re, string, and textblob)

• Performed TF-IDF (Term Frequency - Inverse Document Frequency), Bag-of-Words, and N-Gram for text feature extraction

Closure Daily Delta

• Restructured the daily files and created the Delta file by comparing today’s data with yesterday’s in python (pandas, numpy, datetime, and win32)

• Sent the clear data automatically by email to the team for closures which allowed data analysts to close out dormant and low revenue clients on a daily basis

Prediction of Book Ratings

• Built a Random Forest Regression model that can estimate the book ratings with 0.21 Mean Absolute Error in python (pandas, opendatasets, seaborn, matplotlib, numpy, nltk, wordcloud, pickle, flask, and json)

• Performed data transformation and feature selection to obtain the final dataset