Molecular Epidemiology of Infectious Diseases - computer practical: Glossary

Key Points

Introduction
  • We will work towards confirming or disputing transmission in TB cases

  • After this practical training you will have some familiarity with working on the command line

Logging into Server
  • You can use putty to log in to a server

  • Logging off a server is not the same as turning off a server

Introducing the Shell
  • The shell gives you the ability to work more efficiently by using keyboard commands rather than a GUI.

  • Useful commands for navigating your file system include: ls, pwd, and cd.

  • Most commands take options (flags) which begin with a -.

  • Tab completion can reduce errors from mistyping and make work more efficient in the shell.

Navigating Files and Directories
  • The /, ~, and .. characters represent important navigational shortcuts.

  • Hidden files and directories start with . and can be viewed using ls -a.

  • Relative paths specify a location starting from the current location, while absolute paths specify a location from the root of the file system.

Working with Files and Directories
  • You can view file contents using less, cat, head or tail.

  • The commands cp, mv, and mkdir are useful for manipulating existing files and creating new directories.

  • You can view file permissions using ls -l and change permissions using chmod.

  • The up arrow on your keyboard can be used to repeat recently used commands.

Redirection
  • grep is a powerful search tool with many options for customization.

  • >, and | are different ways of redirecting output.

  • command > file redirects a command’s output to a file.

  • command_1 | command_2 redirects the output of the first command as input to the second command.

Downloading data
  • Wget is a computer program to get data from the internet

  • screen can be used to run a program even when the user is disconnected

  • ‘for’ loops let you perform the same set of operations on multiple files with a single command

  • Sequencing data is large

Assessing Read Quality and Trimming
  • In bioinformatic workflows the output of one tool is the input of the other.

  • FastQC is used to judge the quality of sequencing reads.

  • Data cleaning is an essential step in a genomics pipeline.

Sequence assembly
  • Assembly is a process which aligns and merges fragments from a longer DNA sequence in order to reconstruct the original sequence.

  • k-mers are short fragments of DNA of length k

SNP phylogeny
  • Single nucleotide polymorphisms can be identified by mapping reads to a reference genome

  • Parameters for the analysis have to be selected based on expected outcomes for this organism

  • Concatenation of SNPs helps to reduce analysis volume

  • Phylogenetic trees can be written with a bracket syntax in Newick format

Data Visualization
  • Genetic information can confirm or contradict the meta data

Transmission?
  • SNP phylogeny and metadata can convey different messages

  • Human interpretation is often needed to weigh the different information sources.

  • The low mutation rate of M. tuberculosis does not allow to make confident inferences of transmission but does allow to exclude transmission

Glossary

The glossary would go here, formatted as:

{:auto_ids}
key word 1
:   explanation 1

key word 2
:   explanation 2

({:auto_ids} is needed at the start so that Jekyll will automatically generate a unique ID for each item to allow other pages to hyperlink to specific glossary entries.) This renders as:

key word 1
explanation 1
key word 2
explanation 2