Software Development

Dev Style

Over the last 7 years I have been actively developing dozens of scripts, APIs and analytic workflows for bioinformatic analyses. In particular, I released a number of tools for gene therapy quality control by NGS as well as for nucleotide modification detection by Oxford Nanopore sequencing. 

I ❤️ Python and try to write beautiful and easy-to-maintain code, often using object oriented programming. I follow good programming practices including: documenting my code, version control, testing, packaging and continuous integration/deployment. I run pretty much all my data analyses via Jupyter Notebooks to keep a track of everything and to easily share the results with collaborators. When computing speed is an issue, I  often use multi-threading and sometimes Cython/C. I know my way in R, but thoroughly despise it. I use Bash all the time, though I would never consider it for a script longer than a few lines. Let's face it, it is ugly as hell.

I develop on Ubuntu Linux and my production code runs on a Linux/RedHat high performance computing cluster, but I do my best to write portable multi-platform packages. I have experience in GPU computing and containerisation and I am getting up to speed in cloud computing as well.

Most of my projects are openly released on Github under MIT or GPL licence. Have a look at my Github Profile.

 

 

 

 

Top repositories I developed and maintain

 

pycoQC

 
PycoQC started as a lightweight python API for Nanopore data quality control written for Jupyter Notebook. Then I decided to beef it up by experimenting with Plotly to generate nicer interactive plots allowing to explore the data further 📊. Along the way it has gained many new functionalities and is now one of the most popular nanopore QC tool, downloaded more than 60,000 times 😲. 

Nanocompore

 
NanoCompore is a python command line tools co-developed with Tommaso Leonardi to detect RNA modification by comparing a sample without modifications to an experimental sample. When we started working on NanoCompore we were very disappointed in the existing methods and wanted to write a robust and easy to use alternative. In our hands Nanocompore is one of the best performing option in terms of accuracy.

pycoMeth

 
pycoMeth is a set of tools written in Python to analyse CpG methylation from Nanopore data down to differential methylation analysis. Similar to pycoQC it generates a user friendly interactive HTML report allowing users to explore their data and identify interesting differentialy methylated candidate regions between 2 or more conditions. It is still in active development but has already been download more than 10,000 times.

NanoCount

 
NanoCount is a simple Expectation-Maximisation algorithm to estimate transcript abundance from Nanopore direct-RNA datasets. It is extremely fast and provide accurate estimates.

pycoSnake

 
pycoSnake contains a collection of modular 🐍 Snakemake pipelines to streamline Nanopore and Illumina data analyses, including RNA expression, DNA methylation, Structural variations... The pipelines are ridiculously simple to deploy on local machines, high performance computing clusters or in the Cloud. Conda is the only requirement, everything else is automatically installed when needed.

pyBioTools

 
Contains small utilities I have been developing over the time to fill-in gaps in existing bioinformatic libraries such as BAM file indexing by read ids or FASTQ files merging and filtering. For a reason I don't really comprehend it has been downloaded more than 8,000 times 😕

Sekator

 
Sekator is a python/C hybrid program to detect and trim off any sequence in any location in sequencing reads. It was an exciting project were I got to interface a high level python interface with a fast C Smith and Waterman Alignment algorithm with a binding layer written in Cython. As a result it is very fast and accurate 🚀