academic

Image Classification using Convolutional Neural Networks, TensorFlow and Keras in Python

Why Convolutional Neural Networks? A Convolutional Neural Network is a Deep Learning algorithm which can take in an input image, assign importance to various aspects in the image and be able to differentiate one from the other. The pre-processing required in a CNN is much lower as compared to other classification algorithms. Why TensorFlow and Keras? TensorFlow is an open-source library for building Machine learning models at large scale. It is by far the most popular library for building deep learning models.

Sentiment Analysis of 2019 Australian Election

I work with a group of Data Scientists, most notably Vinicio Haro from Bloomberg, to perform text mining on over 180,000 tweets during 2019 Australian Election. We implement Sentiment Analysis on tweets to find the overall attitude of the twitter users during the time of election. Then we apply Network Analysis to find Twitter users who are best placed to influence the network and find users who can quickly connect with the wider network.

Recommender System of a Large Dataset in Apache Spark and Python

For this project, I build a recommender system using Alternating Least Squares (ALS) matrix factorization in the cloud (Apache Spark) using R to recommend books based on an individual’s ratings.

A R Package for a neat and quick Descriptive Statistical Analysis

A package for a neat and quick descriptive statistical analysis One very important aspect before any data modeling is exploratory data analysis. Without viewing descriptive statistical analysis or appropriate plots, its hard to build the right model. There are several base functions and packages for such analysis and visualization. In several data science papers, I noticed too much detail which can cause a reader to lose interests or poorly rendered plots which does not say much about the data.

A R Package for tidying Statistical Models

A package for tidying statistical models This is the third in a series of blogs I have been writing about the family of packages in TidyVerse. The intention behind this blog came to me several times whenever I had to research while working on a data science project. Often I noticed during research, there are several ways to write codes in R. This creates a disparity in the readability and understanding of codes.

Categorical Variables using TidyVerse

Dealing with categorical variables using TidyVerse Whenever I work on data science using R, I always use tidyverse package. Tidyverse is a great collection of R packages offering data science solutions in the areas of data manipulation, exploration, and visualization that share a common design philosophy. It was created by R industry luminary Hadley Wickham, the chief scientist behind RStudio. R packages in the tidyverse are intended to make data scientists more productive.

Data Driven Animation

Introduction For this project, I will create a data driven animation to tell the story about the rise and fall of autocratic ruling parties around the world from 1940-2015. The dataset has been created by Michael K. Miller, an Associate Professor of Political Science and International Affairs from George Washington University. You can find more about the data here. # Load Libraries library(tidyverse) library(ggmap) library(maps) library(ggthemes) library(gganimate) library(viridis) Data Preparation The Autocratic Ruling Parties Dataset (ARPD) includes a range of variables for all autocratic ruling parties in the world from 1940-2015.

Exploring Spotify API in R

Statistical hypothesis testing on audio features One of the important aspect of a data scientist repertoire is their domain knowledge. And if you plan to work in the world of advertising, you have to know how to work with various web services and their APIs. In this blog, I will use spotifyr package to pull track audio features and other information from Spotify’s Web API in bulk. Spotify is a great site to get data from because they have really unique indices to quantify music.

Functional Programming in R with purrr

Functional Programming in R with purrr I have been writing about the family packages from TidyVerse. Tidyverse is a great collection of R packages offering data science solutions in the areas of data manipulation, exploration, and visualization that share a common design philosophy. Today, I will talk about purrr. As I enter my second to last semester of masters of science in data science program, I have become accustomed to writing several functions for various analysis.

Predicting Fraudulent Online Transactions

Applied feature engineering and LightGBM gradient boosting algorithm in Python to detect fraud from customer transactions.