Data Analytics

Amazon Fine Foods Review Analysis

Given longitudinal data, one should be able to understand how things change over time. Using a longitudinal dataset based on reviews from Amazon, I attempt to understand and visualize the trends of food over the years.

Fake Job Classification

The unemployment rate in the United States acording to the US Department of Labor as of June 2020 is at 11.1%. As all of the job postings are done online now, most companies can directly post to job boards or have job data pulled from job aggregators. However, not all job postings are true job postings as some are fradulent job postings used to harvest data or other sensitive information towards desperate job seekers. Using Natural Language Processing, we built a predictive model to classify potentially fraudulent jobs.

Seoul Pollution Forecasting

Air pollution is a growing problem around the world. Many fast growing countries are increasingly encountering air pollution problems due to the rapid urbanization and modernization of their societies. The metropolitical government of Seoul released data into their air pollution monitoring system over a course of 3 years. We attempt to forecast future pollution levels of various analytes using a vector autoregression model.