Data Science


Hallmark Actor Network Graph (2022)

This network graph shows the interconnectedness of Hallmark Countdown to Christmas actors from 2014 - 2021. I used NetworkX in Python to build the network graph and identify communities using Clauset-Newman-Moore greedy modularity maximization. I exported the nodes, edges and coordinates in order to build the graph in Tableau. I used Tableau to make the graph interactive and accessible. Color represents the community and the size of the nodes is based on the betweenness centrality of the actor.

View and interact with the graph on Tableau Public

Skills: Data Wrangling, Python, NetworkX, Tableau visualization


Global Entrepreneurship Monitor Hackathon - First Place (2022)

Hackathon participants were given 10 days and 7 years of survey data from the Global Entrepreneurship Monitor (GEM). The dataset contained almost 30,000 records and provided insights into entrepreneurial behaviors and attitudes across the US. My challenge was to provide an analysis and infographic on job creation by entrepreneurs and established business owners. I questioned whether entrepreneurship was ‘contagious,’ and if so, how did that relate to job creation? Using inferential analysis, I found an impactful association between personally knowing an early-stage entrepreneur and every aspect of the job-creation cycle. I used R to complete my analysis. I tested the statistical significance of my findings using both the Wilcoxon rank sum test and chi-squared.

Read more about the Hackathon and my win

Skills: R, inferential analysis, weighted survey data, statistical significance testing, data visualization


Forest Cover Type Prediction, Kaggle (2021)

For this Machine Learning final project, I predicted forest cover types using a Kaggle dataset. This was a group project with one collaborator. Four model types were built in R - KNN, Naive Bayes, Random Forest and a Stacked Ensemble. In a detailed paper with a non-technical audience in mind, models are explained, interpreted and pros and cons of each model type are discussed.

Download the paper and R script

Skills: R, machine learning theory, explanation to a non-technical audience, data visualization


Recipe Analysis, Kaggle + Guardian API (2021)

For this Advanced Python Programming final project, I used two data sources: a Food.com dataset which contains 10,000 recipes and over 230k reviews and recipes published by the Guardian throughout the year 2020. I analyzed the data from my perspective as a food blogger who develops and posts recipes throughout the year. I posed the following questions with their corresponding analysis:

  • Can the popularity of a recipe could be predicted by its ingredients? Constructed a Naive Bayes and Logistic Regression model

  • What factors (other than recipe quality) might influence review sentiment? Performed sentiment analysis by season and ingredients

  • What types of recipes are popular throughout the different season? Utilized the Guardian API to pull the article headlines and created word clouds by season

    Download the paper and Jupyter Notebook

    Skills: Python, scikit-learn, data wrangling, API, machine learning, NLTP


Marketing Analytics - EDA (2022)

This paper explores the relationship between various customer attributes and purchase behavior for spenders at a general merchandise retailer. Table analysis and correlation analysis were used to complete the analysis in SAS Studio. p-values for chi-squared and correlation confirmed statistical significance.

Download the paper

Skills: marketing analytics, SAS Studio, exploratory data analysis, statistical significance testing

US Aggregate Consumption Estimation (2022)

Given data from the St. Louis Federal Reserve and the Economic Report of the President, I modeled US aggregate consumption as a function of disposable income and the real interest rate. Serial correlation was tested for and corrected with GLS (Prais-Winsten). R was used to complete the analysis.

Download the paper

Skills: R, econometrics, serial correlation, time series

Next
Next

Baking