I am currently employed as an Associate Computer Scientist/Engineer III in the
Marconi-Rosenblatt AI/ML Innovation Lab at ANDRO Computational
Solutions LLC, a defense contracting firm in Rome, NY. At ANDRO, I lead a team of engineers working to build a
vision-based collision avoidance and navigation subsystem for autonomous UAV applications. Our solution combines
traditional vision-based autonomy components with machine learning based perception modules - reaping the benefits
of robustness and generalization from both approaches. The subsystem is powered by optimized CUDA C++ code and
hardware accelerated vision models deployed upon an embedded NVIDIA Jetson platform, using TensorRT for further
model optimization. Link to project press release.
In 2021 I graduated with a Masters in Computer Science from University of
Massachusetts at Amherst, focusing on reinforcement learning (RL). During my time at UMass, I was fortunate enough to conduct research with
Dr. Shlomo Zilberstein on safe AI methods for
avoiding "dead ends" in online planning and RL. My Masters work culminated with my research on offline reinforcement learning
methods for learning constrained MDPs, during which I was advised by Professors
Ina Fiterau &
Bruno Castro da Silva. Our work, "Constrained Offline Policy
Optimization", introduces a novel constrained policy projection algorithm, which, given a reward optimal policy, will find
the cost-feasible policy that is closest to the provided reward maximizing policy. This work will be featured in ICML 2022.
In May of 2017 I graduated from Cornell University with a BS in Computer Science and thereafter worked full time at ANDRO,
working on projects involving automatic modulation classification and multi-task learning for signal intelligence prior to
working on my current autonomy project.
Some of my research interests include:
Some of my self-study interests include:
At ANDRO, I lead a team of engineers working to build a vision-based collision avoidance and navigation subsystem for autonomous UAV application for the US Navy. We use vision-based autonomy algorithms to aid in navigating a UAV through tactical environments in GPS denied or lost data link scenarios. Additionally, to enhance human-machine teaming, we have developed vision-based natural user interfaces (NUIs) which will allow humans to interface with the UAV via natural gestures (hand signals) and chromatic symbols.
In this work we introduce Constrained Offline Policy Optimization (COPO), an offline policy optimization algorithm for learning in MDPs with cost constraints. COPO is built upon a novel offline cost-projection method, which we formally derive and analyze. Our method improves upon the state-of-the-art in offline constrained policy optimization by explicitly accounting for distributional shift and by offering non-asymptotic confidence bounds on the cost of a policy. These formal properties are superior to those of existing techniques, which only guarantee convergence to a point estimate. We formally analyze our method and empirically demonstrate that it achieves state-of-the-art performance on discrete and continuous control problems, while offering the aforementioned improved, stronger, and more robust theoretical guarantees.
The offline reinforcement learning setting - learning policies from a static data set - introduces challenges associated with the inability to sample data from the learned policy. Among such challenges, overestimation of state-action values and distributional shift are the most detrimental to learning policies that perform well once deployed. In this work, we introduce a novel offline learning algorithm that directly addresses both overestimation and distributional shift, without restricting value estimates to the data distribution. Our algorithm, entitled Initial and Semi-Implicit Q-Learning (ISIQL), learns using value targets constructed from a mixture of estimates from both the data distribution and the current policy's action distribution, thereby allowing for policy improvement outside of the behavior distribution when possible. Our value objective additionally incorporates an initial state-action value term, which we show, allows for mitigation of distributional shift. We motivate the use of these learning components by connecting to prior work, and show various ways a stochastic policy may be extracted from the learned value functions. Lastly, we show that the ISIQL algorithm achieves state-of-the-art performance on online MuJoCo benchmark tasks and offline D4RL data sets, most notably offering a 10% performance gain in offline locomotion and maze tasks.
At ANDRO, our team is in the process of devloping a machine learning based RF signal detector and classifier for the Army. The envisioned finished final product should aid EM spectrum analysts in signal identification by providing a multitude of estimated signal characteristics to the user simultaneously. Our approach builds off of our automatic modulation classification (AMC) technology by extending the solution to a multi-task learning paradigm.