Commonly used phylogenetic methods, Monash HPC access and basic bioinformatics

Software We Use

The following is a list of frequently-used-software. In general, try not to install computational software via ESS. The installation instructions shown here are only for Macs.

Generic Software

  • XCode and XCode Command Line Tools (for Macs) - sets up your Mac for development work. Get these first and foremost, because many of the computational programs require these to install properly. You can get XCode directly from the App store. Getting XCode CLT is trickier as getting it to download and install properly is known to be quite temperamental.
  • homebrew - A Mac OS package manager. Get it here. You'll need XCode first, and maybe XCode CLT as well. If you're lucky, homebrew will help to download and install XCode CLT for you (but don't count on it)
  • Python3
  • R (from R Studio)
  • blastn, blastx (CLI preferred)

Phylogenetics Software

  • MAFFT - For multiple sequence alignment. An executable.
  • RAxML - For computing trees using maximum likelihood. This is available on M3, so installing this is optional. Here's my RAxML installation and quickstart tutorial.
  • IQ-Tree - For tree computation using maximum likelihood. An executable.
  • FastTree - Fast tree computation. An executable. FastTree.c needs to be downloaded and compiled.
  • BEAST v1.8.4, v2.x - For tree computation using Bayesian statistics. Has a few other programs for post-processing BEAST output.
  • FigTree - a nice, lightweight program for tree drawing. Simple .dmg download and install.
  • TempEst - another nice program for tree drawing. Simple .dmg download and install.
  • CDhit - For clustering DNA sequences by similarity. Needs to be downloaded and compiled.
  • PAML - Multi-purpose analysis package with miscellaneous uses.
  • Treesub - For plotting amino acid changes along a tree, built around PAML and RAxML. Java-compiled. Get it here
  • HyPhy CLI/GUI - for selection analysis
  • Datamonkey - Web app, so no installation required. Hidden constraint that they don't announce beforehand: Can only accept a max of 500 sequences.
  • Antigenic cartography - Web app.
  • AliViewer - Allows you to look at a .fasta file of sequence data.

For Python/R programmers:

  • conda for automatic package management. Conda environments are also great for controlling your packages, and version control between Py36 and Py27.
  • Useful packages: snakemake, anaconda, pandas, numpy, Biopython, xlrd, scipy, scikit-bio
  • pip
  • Jupyter recommended as an IDE.
  • atom

## Things to Read up on

Mathematical Concepts

Recommended youtube channel: khan academy for undergrad-level theory, and mathematicalmonk for higher-level concepts. An in-depth knowledge of these concepts is not essential, unless you're aiming to specialize in that area - you certainly don't need to know how to do the maths by hand. An undergraduate understanding of these is sufficient; even Wikipedia is a little overkill.

  • Linear regression
  • Markov chains
  • Maximum likelihood
  • Bayesian statistics
  • MCMC - the most difficult of the lot. Don't bother if you don't need to know this.

Phylogenetics Concepts

Programming

Having a good grasp on how to read, if not write, code is helpful, but not essential.

  • Python, or R to start off. IMO, Python is superior to R in every way except for plotting. The Youtube channel thenewboston is a good place to start.
  • Github. Here's a recommended video.
  • Bash terminal, and Linux/Mac OS organization.