Bio

My CV.

I am an Assistant Professor with the TISLab in the Department of Genetics at the UNC Chapel Hill School of Medicine. Previously I was an Assistant Professor at the CU Anschutz Department of Biomedical Informatics and Senior Faculty Research Assistant with the Center for Genome Research and Biocomputing at Oregon State University. I earned my Ph.D./M.S. in 2012/2009 from the University of Notre Dame in the Department of Computer Science and Engineering, and my Bachelors in Computer Science from Northern Michigan University.

Read more

Share

Homelab

Sometimes you just get tired of paying AWS & Digital Ocean. A quick sketch of my homelab, inspired by the good folks at r/homelab.

The main goal of the project was to enable the quick creation of Virtual Machines (or containers), accessible by subdomain such as project1.mydomain.net, project2.mydomain.net, etc., all under a single dynamic IP address. I managed to make it work pretty well, with a little help from NGINX for reverse proxying, pfsense for local DNS and routing, ddclient for dynamic DNS, Proxmox for hypervisor management, and freeNAS for shared storage.

Read more

Share

Random Forests & Gradient Boosting

In an earlier post we considered machine learning “models” as functions producting predictions from data, and training models to be the production of these functions with higher-order functions. From there we built regression trees–models created recursively by determining 1) a splitting column (column 1 or 2 in this case), and 2) a good value in that column to split the dataset on. At each split, we find a column and value that produces two relatively homogenous sets of y values (in the sense that the values in each y subset can be well-predicted from the column values).

Read more

Share

Regression Trees & Bagging (more functional R)

I recently ran across this excellent article explaining gradient boosting in the context of regression trees. The article concludes by describing how the technique implements a gradient-descent process, but what I find most fascinating is the concept of “functional modeling”–building machine learning models from other models as building blocks.

This post explores that idea by implementing regression trees in base R (with a little visualization help from ggplot2, dplyr, and tidyr) with functional programming concepts, including a technique called bootstrap aggregating. In a future post we’ll extend to random forests and gradient boosting machines.

Read more

Share

Automatic Differentiation & Functional Operators in R

I’ve been studying up on deep learning recently (I know, trendy), and I learned something along the way that I think is just incredible.1

First, a little background: deep learning models are artificial neural networks, represented as potentially thousands of nodes with millions of weighted connections between them. Input numbers are fed in to some nodes on one side, and out pops output numbers from some nodes on the other side, after winding through the nodes and weighted connections. The goal is to adjust the connection weights such that the outputs are what we want for any given input.

Read more

Share

Bio/Recursion: CS and Bioinformatics in R

This book is based on a workshop I taught a few times at OSU, meant to introduce computer science theory to biologists.

Digital version available online at Leanpub.

Also available in print at Amazon.

It introduces topics in programming, computer science, and bioinformatics via examples in the R programming language. While often associated with statistics, Bio/Recursion employs R’s algorithmic capabilities to implement and visualize several fascinating methods, ranging from DNA alignment to drawing fractal trees.

Read more

Share

A Primer for Computational Biology

Read Online at Open Oregon State (Open-Access & Free!)

Or order a copy from Amazon or OSU Press. (Scribble in the margins!)

This Open-Access textbook was a collaboration with Open Oregon State and OSU Press. It aims to provide life scientists and students the skills necessary for research in a data-rich world. The text covers accessing and using remote servers via the command-line, writing programs and pipelines for data analysis, and provides useful vocabulary for interdisciplinary work. The book is broken into three parts:

Read more

Share

3D Printing & Modeling, Online Tools

I’ve been teaching an evening class at the Oregon State University Craft Center in “3D modeling and printing.” I don’t know a ton about this topic, to be honest. But I know a little more than I did a year ago, and it was hard-earned knowledge. Why not share it?

At OSU it seems like I can’t walk out the door without bumping into a 3d printer (and students and faculty get access to them), but I found very little in-person information about how to do basic 3d design. There are a ton of tools and tutorials online, but for a while I struggled to find the right ones. Many of the “professional” tools (like AutoCAD and Blender) have very steep learning curves. This post covers some of the more accessible software I found. Because I teach at the craft center in a computer lab, I wasn’t keen on installing lots of software, so I especially sought out free, web-browser-based tools.

Read more

Share