: Glossary

Key Points

Introduction
  • After the first two days you will have some familiarity with working on the command line, data management, cleaning and visualization, automation and scripting

Logging onto Cloud
  • You can use one set of log-in credentials for many instances

  • Logging off an instance is not the same as turning off an instance

Introducing the Shell
  • The shell gives you the ability to work more efficiently by using keyboard commands rather than a GUI.

  • Useful commands for navigating your file system include: ls, pwd, and cd.

  • Most commands take options (flags) which begin with a -.

  • Tab completion can reduce errors from mistyping and make work more efficient in the shell.

Navigating Files and Directories
  • The /, ~, and .. characters represent important navigational shortcuts.

  • Hidden files and directories start with . and can be viewed using ls -a.

  • Relative paths specify a location starting from the current location, while absolute paths specify a location from the root of the file system.

Working with Files and Directories
  • You can view file contents using less, cat, head or tail.

  • The commands cp, mv, and mkdir are useful for manipulating existing files and creating new directories.

  • You can view file permissions using ls -a and change permissions using chmod.

  • The history command and the up arrow on your keyboard can be used to repeat recently used commands.

Redirection
  • grep is a powerful search tool with many options for customization.

  • >, >>, and | are different ways of redirecting output.

  • command > file redirects a command’s output to a file.

  • command >> file redirects a command’s output to a file without overwriting the existing contents of the file.

  • command_1 | command_2 redirects the output of the first command as input to the second command.

Writing Scripts
  • Scripts are a collection of commands executed together.

Project Organization
  • Spend the time to organize your file system when you start a new project. Your future self will thank you!

  • Always save a write-protected copy of your raw data.

Assessing Read Quality
  • Quality encodings vary across sequencing platforms.

  • for loops let you perform the same set of operations on multiple files with a single command.

Trimming and Filtering
  • The options you set for the command-line tools you use are important!

  • Data cleaning is an essential step in a genomics workflow.

Variant Calling Workflow
  • Bioinformatics command line tools are collections of commands that can be used to carry out bioinformatics analyses.

  • To use most powerful bioinformatics tools, you’ll need to use the command line.

  • There are many different file formats for storing genomics data. It’s important to understand these file formats and know how to convert among them.

Automating a Variant Calling Workflow
  • We can combine multiple commands into a shell script to automate a workflow.

  • Use echo statements within your scripts to get an automated progress update.

R for microbial genomics
Introduction Day3
  • Sequencing S. pneumoniae patient isolates to determine assocations of bacterial genes with disease severity

Sequence assembly
  • Assembly is a process which aligns and merges fragments from a longer DNA sequence in order to reconstruct the original sequence.

  • k-mers are short fragments of DNA of length k

Sequence Quality
  • Quality of a genome assembly can be assessed by looking at some basic statistics on the assembly, but also by using and external reference

Inspecting sequence graphs
  • A genome assembly is fragmented because of repeats in the genome. The assembly graph display possible connections between contigs.

Annotation
  • Genome annotation includes prediction of protein-coding genes, as well as other functional genome units

  • It often starts by identifying open reading frames

  • Predicted sequences are further analysed with BLAST

  • Larger DNA sequences or genomes require automated prediction and annotation

Pangenome analysis
  • The microbial pangenome is the union of genes in genomes of interest.

  • The microbial core genome is the intersection of genes shared by genomes of interest.

  • Roary is a pipeline to determine genes of the pangenome.

Phylogenetic trees from the core genome
  • A tree can be generated from a combined set of proteins for better resolution

Bacterial GWAS
  • Contigency testing for gene presence absence to associate a genotype with a phenotype, similar to GWAS in clinical genetics is possible with bacterial genomes

Wrapup Day 3 and 4
  • Phage-Derived Protein Induces Increased Platelet Activation and Is Associated with Mortality in Patients with Invasive Pneumococcal Disease

Glossary

The glossary would go here, formatted as:

{:auto_ids}
key word 1
:   explanation 1

key word 2
:   explanation 2

({:auto_ids} is needed at the start so that Jekyll will automatically generate a unique ID for each item to allow other pages to hyperlink to specific glossary entries.) This renders as:

key word 1
explanation 1
key word 2
explanation 2