Commonly used phylogenetic methods, Monash HPC access and basic bioinformatics

M3 Tips and Tricks

Frequently Used HPC Commands

squeueView all jobs currently being run on the HPC
squeue --user=<your_username>View just the jobs that you're running
show_clusterCheck the availability of different nodes
mkdir <name_of_new_folder>Create a new folder
lsSee what files there are in the folder that you're currently in
ls -GSame as ls, but with colour-coding by file type
nano my_fileTo view or edit my_file
rm file_to_deleteDelete a file

You can alias some of these commands to prevent yourself from having to type long commands over and over again. Access and modify your bash_profile using:

nano ~/.bash_profile

And add an alias like:

alias qu="squeue --user=<your_username>

Running the same command on multiple files.

Say I wish to perform the same MSA on several fasta files. That is, I'd like to run:

mafft input.fasta > output.fasta

A typical example is doing multiple-sequence alignment on all eight gene segments of a flu virus, which exist as eight .fasta files.

You can create a text file with a list of bash commands, and run them all one after another by calling the bash command on that text file. For instance, create a file my_run.txt, containing the following:

echo "first command"
echo "second command"

And in your terminal, cd to the place where you've saved my_run.txt and do:

bash my_run.txt

This will print out "first command", "second command", and today's date.

Similarly, you can prepare a msa_run.txt file like:

mafft input1.fasta > output1.fasta
mafft input2.fasta > output2.fasta
mafft input3.fasta > output3.fasta

And a SLURM script like:

module load mafft/7.310

Upload the fasta files, the run text file and the SLURM script onto M3, and submit the SLURM job as usual. This will execute each mafft command one after another.

Running multiple commands simultaneously

Answer from help@massive:



#SBATCH -J myprog-%3 
# job name for the array 
#SBATCH -n 1 
# Number of task 
#SBATCH -p com 
# Partition com 
#SBATCH -t 0-3:00:00 
# 3 hours (D-HH:MM) 
#SBATCH -o myprog-%A-%a.out 
# Standard outout %A" is replaced by the job ID and "%a" with the array index 
#SBATCH -e myprog-%A-%a.err 
# Standard error 


You will need to rename your program to myprogram-1, myprogram-1, myprogram-3

To submit this:

sbatch –array=1-3 ArraySubmit.slurm