Avatar

Data Scientist

👋 I specialize in the Natural Language Processing field utilizing the following open source stack. My goal is to create machine learning products to help others through automation of communication between human and machine.

🕵 My recent work 🔑 👇 streamlit

Interests

  • Natural Language Processing
  • AI/Machine Learning
  • Mindfulness
  • Fitness

Education

  • MSc in Data Science, 2019

    CUNY School Of Professional Studies

  • BSc in Applied Mathematics, 2017

    New York City College of Technology

  • ASc in Computer Science, 2015

    New York City College of Technology

Recent Activity

Outputs of My Mind

A R Package for a neat and quick Descriptive Statistical Analysis

A package for a neat and quick descriptive statistical analysis One very important aspect before any data modeling is exploratory data …

A R Package for tidying Statistical Models

A package for tidying statistical models This is the third in a series of blogs I have been writing about the family of packages in …

Categorical Variables using TidyVerse

Dealing with categorical variables using TidyVerse Whenever I work on data science using R, I always use tidyverse package. Tidyverse …

Projects

The Showcase of the Immortals

*

Image Classification using Convolutional Neural Networks, TensorFlow and Keras in Python

Why Convolutional Neural Networks? A Convolutional Neural Network is a Deep Learning algorithm which can take in an input image, assign importance to various aspects in the image and be able to differentiate one from the other. The pre-processing required in a CNN is much lower as compared to other classification algorithms. Why TensorFlow and Keras? TensorFlow is an open-source library for building Machine learning models at large scale. It is by far the most popular library for building deep learning models.

Sentiment Analysis of 2019 Australian Election

I work with a group of Data Scientists, most notably Vinicio Haro from Bloomberg, to perform text mining on over 180,000 tweets during 2019 Australian Election. We implement Sentiment Analysis on tweets to find the overall attitude of the twitter users during the time of election. Then we apply Network Analysis to find Twitter users who are best placed to influence the network and find users who can quickly connect with the wider network.

Recommender System of a Large Dataset in Apache Spark and Python

For this project, I build a recommender system using Alternating Least Squares (ALS) matrix factorization in the cloud (Apache Spark) using R to recommend books based on an individual’s ratings.

Predicting Fraudulent Online Transactions

Applied feature engineering and LightGBM gradient boosting algorithm in Python to detect fraud from customer transactions.