A complete listing of my open-source software can be found on my GitHub page.

streaming_pileupy

streaming_pileupy implements a missing samtools mpileup feature. It allows the creation of text-based pileups from SAM files containing multiple samples in a sample-aware manner using the SM tag of the read-group identifier (@RG).

abeona

Abeona is an experimental transcriptome assembler based on the cortexpy library and kallisto. In this paper we use abeona to reconstruct and visualize transcript isoforms of DAL19 in Norway spruce.

cortexpy

Cortexpy is a python package for inspecting, traversing, and manipulating colored and linked De Bruijn graphs in Cortex/Mccortex format. A special thanks to Kiran V Garimella who was instrumental in helping me get this project off the ground and for laying the groundwork with his own Cortex library CortexJDK.

Visual Cortex

Visual cortex is a collection of HTML, CSS, and JavaScript that I use to visualize small, multi-color transcript De Bruijn graphs that I create using cortexpy.

a_visual_cortex_example

dmpy

Dmpy is a python port of DistributedMake. It’s friendlier on the eyes than DistributedMake…

Where have all the semi-colons gone?

from dmpy import DistributedMake, get_dm_arg_parser

m = DistributedMake(args_object=get_dm_arg_parser().parse_args())

m.add("test_output_file", None, "echo 'hi world'")
m.execute()

GLPhase

GLPhase is a CUDA-enabled haplotype phasing and imputation tool for tens of thousands of low-coverage sequencing samples. GLPhase was developed for the Haplotype Reference Consortium. The GLPhase source code is available here, and our paper describing the method is available here.

hapfuse

Hapfuse is a fast haplotype ligation tool designed primarily to work with haplotypes in Binary Call Format, although other formats are supported. Hapfuse is an extension of the hapfuse code written by Yi Wang for the SNPTools package. Hapfuse has been used to ligate haplotypes phased in overlapping blocks in conjunction with GLPhase and with SHAPEIT3.

DistributedMake

DistributedMake is a domain-specific language for writing GNU Make files from inside perl. First created by Kiran V Garimella during his first tour at the Broad Institute, it was further developed by Kiran and me during our time at the Wellcome Trust Centre for Human Genetics.

Long live the semi-colon …

#!/usr/bin/env perl -w
use strict;
use DM;
my @args = (dryRun => 0, numJobs=>100, engineName => 'localhost');
my $dm = DM->new(@args);    

for my $val (1..100){
    $dm->addRule(
        "hello_world_$val",
        "",
        "echo $val; sleep 5; echo ".($val + 100)."; touch hello_world_$val"
    );
}
$dm->execute();