I am very interested in the application of deep learning for visual understanding and reinforcement learning tasks. I've been excited about artificial intelligence since my 2nd year in college and the recent resurgence of AI has made now a very exciting time to be a researcher. Currently, my research focuses on finding more sample-efficient and principled ways for reinforcement learning algorithms to learn from their environment. My previous work focused on domain adaptation and transfer learning for deep neural networks. Particulary, I am interested in knowledge transfer from synthetic environments, such as video games, to real-world scenarios.
With the surge of autonomous driving efforts in industry, semantic understanding of road scenes has become more commercially relevant than ever. Semantic segmentation, or dense pixel prediction, has become a popular approach for scene understanding, as the meteoric rise of deep convolutional networks (CNNs) has led to significant progress in recent years. However, these deep networks require large amounts of data, labeled at the pixel level, which can be quite cumbersome to collect for the driving domain. As a result, many have looked to realistic virtual world simulators (i.e. video games), for collecting large volumes of labeled data. Despite the visual similarity between real and synthetic domains, models trained on synthetic street scenes show poor results when evaluated on real-world data. To compensate for this visual shift, supervision from the real domain is necessary. In this work, I examine how much real-world supervision is appropriate for effective transfer of models pretrained on synthetic. Utilizing recent methods of supervised and semi-supervised transfer for fully convolutional networks (FCNs), I achieve promising results with a very small amount of labeled data(~50 images). By also quantitatively measuring different levels of domain shift, I reveal how simple metrics about a synthetic domain can be used to infer the ability of network features to transfer.
Recent studies have shown that features from deep convolutional neural networks learned using large labeled datasets, like ImageNet, provide effective representations for a variety of visual recognition tasks. They achieve strong performance as generic features and are even more effective when fine-tuned to target datasets. However, details of the fine-tuning procedure across datasets and with different amount of labeled data are not well-studied and choosing the best fine-tuning method is often left to trial and error. In this work we systematically explore the design-space for fine-tuning and give recommendations based on two key characteristics of the target dataset: visual distance from source dataset and the amount of available training data. Through a comprehensive experimental analysis, we conclude, with a few exceptions, that it is best to copy as many layers of a pre-trained network as possible, and then adjust the level of fine-tuning based on the visual distance from source.
Deep Convolutional Generative Adversarial Networks (DCGANs) have become popular in recent months for their ability to effectively capture image distributions and generate realistic images. Recent work has also shown that conditional information provided to Generative Adversarial Networks (GANs) allows for deterministic control of generated images simply through the regulation of the conditional vector. Although many have hinted that language models could be used as conditional information for generating images from words, there are very few attempts using GANs, much less DCGANs. In this project we explore and analyze the results of image generation by encoding captions as conditional information to our DCCGAN. We use a subset of the MSCOCO dataset to evaluate our results.
PDF STATUS: COMPLETED
Aside from research, I've had the chance to apply machine learning and data mining methods for numerous applications. Here are some of my other projects from the last few years
Here I used unsupervised and supervised machine learning methods, including autoregressive hidden markov models to predicting each teams wins and losses over the regular season. These predictions used team game-by-game statistics, opponent strength, and player baseline talent level. The image on the left shows my predictions for the current NBA season.
For this project, my teammates and I ran TextRank and PageRank algorithms for custom web crawlers to extract key phrases and the most popular words from the website of the university news publication, the DailyCal.