Face masks are one of the key strategies listed from the CDC to prevent the spread of the novel COVID-19. However, not everybody follows those guidelines. Using an automated machine learning approach, it is possible to detect face mask wearing compliance. Additionally, it can lead to better safety measures as one can use remote systems to monitor for mask compliance. In this post, we use Yolo v4 to perform object detection on face masks.
Q Learning is a off policy reinforcement learning algorithm which has been popularized by Deep-Q networks used in networks in games such as in Go and DOTA. OpenAI hosts a bunch of different environments to reinforcement learning models to play around with. This post will give an example of how to implement a Q learning algorithm in atari games.
Given longitudinal data, one should be able to understand how things change over time. Using a longitudinal dataset based on reviews from Amazon, I attempt to understand and visualize the trends of food over the years.
The unemployment rate in the United States acording to the US Department of Labor as of June 2020 is at 11.1%. As all of the job postings are done online now, most companies can directly post to job boards or have job data pulled from job aggregators. However, not all job postings are true job postings as some are fradulent job postings used to harvest data or other sensitive information towards desperate job seekers. Using Natural Language Processing, we built a predictive model to classify potentially fraudulent jobs.
Air pollution is a growing problem around the world. Many fast growing countries are increasingly encountering air pollution problems due to the rapid urbanization and modernization of their societies. The metropolitical government of Seoul released data into their air pollution monitoring system over a course of 3 years. We attempt to forecast future pollution levels of various analytes using a vector autoregression model.
Neural Machine Translation(NMT) is a relatively new approach towards machine translation. This project is an attempt into trying to build an translation model using the seq2seq architecture to perform zero shot translation between three different languages.
Apache Airflow is a highly rated data orchestration software. In this project, I protoype the usage of Apache Airflow as a proof of concept for bioinformatics using a common metagenomic pipeline. Although not featured here, this was part of a greater architecture where the data is passed into Amazon RDS and visualized using Tableau.
Variant annotation of single nucleotide polymorphisms are very important in understanding how a mutation in a given location can cause downstream effects. Oncolonnator was built to take in variant call VCF files and to annotate the mutations with potential effects using the ExAC rest API.
HLA-PRG-LA is a algorithm built to genotype human leukocyte antigen (HLA) types from whole genome and whole exome next generation sequencing data. The installation is quite involved and the algorithm is resource intensive. The algorithm was containerized in order to quickly scale with potentially large compute clusters in mind.