For this project, I build a recommender system using Alternating Least Squares (ALS) matrix factorization in the cloud (Apache Spark) using R to recommend books based on an individual’s ratings.
A package for a neat and quick descriptive statistical analysis One very important aspect before any data modeling is exploratory data analysis. Without viewing descriptive statistical analysis or appropriate plots, its hard to build the right model. There are several base functions and packages for such analysis and visualization. In several data science papers, I noticed too much detail which can cause a reader to lose interests or poorly rendered plots which does not say much about the data.
A package for tidying statistical models This is the third in a series of blogs I have been writing about the family of packages in TidyVerse. The intention behind this blog came to me several times whenever I had to research while working on a data science project. Often I noticed during research, there are several ways to write codes in R. This creates a disparity in the readability and understanding of codes.
Dealing with categorical variables using TidyVerse Whenever I work on data science using R, I always use tidyverse package. Tidyverse is a great collection of R packages offering data science solutions in the areas of data manipulation, exploration, and visualization that share a common design philosophy. It was created by R industry luminary Hadley Wickham, the chief scientist behind RStudio. R packages in the tidyverse are intended to make data scientists more productive.
Introduction For this project, I will create a data driven animation to tell the story about the rise and fall of autocratic ruling parties around the world from 1940-2015.
The dataset has been created by Michael K. Miller, an Associate Professor of Political Science and International Affairs from George Washington University. You can find more about the data here.
# Load Libraries library(tidyverse) library(ggmap) library(maps) library(ggthemes) library(gganimate) library(viridis) Data Preparation The Autocratic Ruling Parties Dataset (ARPD) includes a range of variables for all autocratic ruling parties in the world from 1940-2015.
Statistical hypothesis testing on audio features One of the important aspect of a data scientist repertoire is their domain knowledge. And if you plan to work in the world of advertising, you have to know how to work with various web services and their APIs. In this blog, I will use spotifyr package to pull track audio features and other information from Spotify’s Web API in bulk. Spotify is a great site to get data from because they have really unique indices to quantify music.
Functional Programming in R with purrr I have been writing about the family packages from TidyVerse. Tidyverse is a great collection of R packages offering data science solutions in the areas of data manipulation, exploration, and visualization that share a common design philosophy. Today, I will talk about purrr.
As I enter my second to last semester of masters of science in data science program, I have become accustomed to writing several functions for various analysis.