Software We Use
The following is a list of frequently-used-software. In general, try not to install computational software via ESS. The installation instructions shown here are only for Macs.
- XCode and XCode Command Line Tools (for Macs) - sets up your Mac for development work. Get these first and foremost, because many of the computational programs require these to install properly. You can get XCode directly from the App store. Getting XCode CLT is trickier as getting it to download and install properly is known to be quite temperamental.
homebrew- A Mac OS package manager. Get it here. You'll need XCode first, and maybe XCode CLT as well. If you're lucky,
homebrewwill help to download and install XCode CLT for you (but don't count on it)
- R (from R Studio)
- MAFFT - For multiple sequence alignment. An executable.
- RAxML - For computing trees using maximum likelihood. This is available on M3, so installing this is optional. Here's my RAxML installation and quickstart tutorial.
- IQ-Tree - For tree computation using maximum likelihood. An executable.
- FastTree - Fast tree computation. An executable.
FastTree.cneeds to be downloaded and compiled.
- BEAST v1.8.4, v2.x - For tree computation using Bayesian statistics. Has a few other programs for post-processing BEAST output.
- FigTree - a nice, lightweight program for tree drawing. Simple
.dmgdownload and install.
- TempEst - another nice program for tree drawing. Simple
.dmgdownload and install.
- CDhit - For clustering DNA sequences by similarity. Needs to be downloaded and compiled.
- PAML - Multi-purpose analysis package with miscellaneous uses.
- Treesub - For plotting amino acid changes along a tree, built around
RAxML. Java-compiled. Get it here
- HyPhy CLI/GUI - for selection analysis
- Datamonkey - Web app, so no installation required. Hidden constraint that they don't announce beforehand: Can only accept a max of 500 sequences.
- Antigenic cartography - Web app.
- AliViewer - Allows you to look at a
.fastafile of sequence data.
For Python/R programmers:
- conda for automatic package management. Conda environments are also great for controlling your packages, and version control between Py36 and Py27.
- Useful packages:
- Jupyter recommended as an IDE.
## Things to Read up on
Recommended youtube channel: khan academy for undergrad-level theory, and mathematicalmonk for higher-level concepts. An in-depth knowledge of these concepts is not essential, unless you're aiming to specialize in that area - you certainly don't need to know how to do the maths by hand. An undergraduate understanding of these is sufficient; even Wikipedia is a little overkill.
- Linear regression
- Markov chains
- Maximum likelihood
- Bayesian statistics
- MCMC - the most difficult of the lot. Don't bother if you don't need to know this.
- The Wikipedia article on computational phylogenetics is a good starting point - it's sufficiently comprehensive that, at least, you'll be able to pinpoint what you don't know and look for that elsewhere. Also, admittedly, passive reading is a pretty dry and ineffective way to learn; there are "learn by doing"-style tutorials in the works.
- How to interpret a phylogenetic tree, by Andrew Rambaut. Or this Khan academy video.
- Models of DNA substitution, from Wikipedia.
- Hierarchical clustering. Otherwise known as "neighbour joining (NJ)" in phylogenetics literature. We don't use NJ trees very often, but it's a good conceptual starting point, and is easy enough to do by hand.
Having a good grasp on how to read, if not write, code is helpful, but not essential.