I am a postdoctoral researcher in the Kern Lab at the University of Oregon. My work lies at the intersection of population genomics, bioinformatics, and computer science, where I seek to develop computational tools that better allow us to understand how organisms adapt to their environments.
Much of my recent work has focued on using machine learning to infer the genome-wide landscape of recombination rates. Recombination plays an essential role in a myriad of downstream evolutionary and biological processes of key interest to researchers in fields of evolutionary, population, and medical genetics. From the mapping of genetic disease variants in humans, to inferring demographic histories, tools for accurately inferring rates of recombination are indispensable. For decades, population geneticists have shown great interest in developing methods to estimate rates of recombination from genetic data, and while a number of methods are currently in wide use, many suffer from known biases, such as poor performance on small sample sizes or with model misspecification.
My work builds on recent breakthroughs in machine leaning technology, in particular the development of deep artificial neural networks, and applies these technologies to the problem of inferring recombination rates. In addition to providing an end-to-end pipeline for inferring rates of recombination that is robust to both small sample sizes and model misspecification, I show that genome-wide recombination landscapes are highly correlated among different populations of Drosophila melanogaster, and that differences in recombination rates between populations can in part be attributed to the effects of polymorphic inversions in this species.
Machine learning, and deep learning in particular, have the potential to revolutionize the fields of evolutionary and population genomics. My research seeks to be at the forefront of this sea change, and to lead the development of the tools and methods that will accelerate our improved understanding of the adaptive process.
A software pipeline that uses a recurrent neural network to infer the genome-wide landscape of recombination rates directly from raw polymorphism data. Phased or unphased VCFs are used as input and modeled without the data compression associated with the use of summary statistics.
A software pipeline that uses paired-end Illumina sequence data to both discover novel transposable elements (TEs) and perform TE genotyping. Input samples can either be individually sequenced or sequenced as a pool, and multiple samples can be analyzed simultaneously to improve sensitivity.
A community-maintained library of standardized population genetic simulation models, spanning diverse taxa. These simulations are currently being used to benchmark popular tools aimed at estimating historical population size histories.
A software tool that simulates the insertion and deletion of transposable elements in a population of chromosomes, including the simulation of target site duplications. These simulations can be used to benchmark the accuracy of TE-calling software packages.
I got my Ph.D. at Indiana University (IU), where I was broadly trained in evolutionary biology and population genetics. There I worked on a diversity of evolutionary topics, such as characterizing the role of compensatory evolution in driving the evolution of mitochondrial-nuclear gene complexes and identifying how spatially varying selection shapes the genome-wide distribution of transposable elements in Drosophila. At IU I also enrolled in formal coursework to study bioinformatics and machine learning, and my interest in these areas has fueled my desire to develop computational tools to better understand adaptation.
I currently live in Eugene, Oregon, with my wife, Sarah, and our dog, Sandy. I enjoy camping, biking, and going to see live music. My CV and contact information can be found here .