: Glossary

Key Points

Introduction	After the first two days you will have some familiarity with working on the command line, data management, cleaning and visualization, automation and scripting
Logging onto Cloud	You can use one set of log-in credentials for many instances Logging off an instance is not the same as turning off an instance
Introducing the Shell	The shell gives you the ability to work more efficiently by using keyboard commands rather than a GUI. Useful commands for navigating your file system include: `ls`, `pwd`, and `cd`. Most commands take options (flags) which begin with a `-`. Tab completion can reduce errors from mistyping and make work more efficient in the shell.
Navigating Files and Directories	The `/`, `~`, and `..` characters represent important navigational shortcuts. Hidden files and directories start with `.` and can be viewed using `ls -a`. Relative paths specify a location starting from the current location, while absolute paths specify a location from the root of the file system.
Working with Files and Directories	You can view file contents using `less`, `cat`, `head` or `tail`. The commands `cp`, `mv`, and `mkdir` are useful for manipulating existing files and creating new directories. You can view file permissions using `ls -a` and change permissions using `chmod`. The `history` command and the up arrow on your keyboard can be used to repeat recently used commands.
Redirection	`grep` is a powerful search tool with many options for customization. `>`, `>>`, and `\|` are different ways of redirecting output. `command > file` redirects a command’s output to a file. `command >> file` redirects a command’s output to a file without overwriting the existing contents of the file. `command_1 \| command_2` redirects the output of the first command as input to the second command.
Writing Scripts	Scripts are a collection of commands executed together.
Project Organization	Spend the time to organize your file system when you start a new project. Your future self will thank you! Always save a write-protected copy of your raw data.
Assessing Read Quality	Quality encodings vary across sequencing platforms. `for` loops let you perform the same set of operations on multiple files with a single command.
Trimming and Filtering	The options you set for the command-line tools you use are important! Data cleaning is an essential step in a genomics workflow.
Variant Calling Workflow	Bioinformatics command line tools are collections of commands that can be used to carry out bioinformatics analyses. To use most powerful bioinformatics tools, you’ll need to use the command line. There are many different file formats for storing genomics data. It’s important to understand these file formats and know how to convert among them.
Automating a Variant Calling Workflow	We can combine multiple commands into a shell script to automate a workflow. Use `echo` statements within your scripts to get an automated progress update.
R for microbial genomics	R-microbial-genomics
Introduction Day3	Sequencing S. pneumoniae patient isolates to determine assocations of bacterial genes with disease severity
Sequence assembly	Assembly is a process which aligns and merges fragments from a longer DNA sequence in order to reconstruct the original sequence. k-mers are short fragments of DNA of length k
Sequence Quality	Quality of a genome assembly can be assessed by looking at some basic statistics on the assembly, but also by using and external reference
Inspecting sequence graphs	A genome assembly is fragmented because of repeats in the genome. The assembly graph display possible connections between contigs.
Annotation	Genome annotation includes prediction of protein-coding genes, as well as other functional genome units It often starts by identifying open reading frames Predicted sequences are further analysed with BLAST Larger DNA sequences or genomes require automated prediction and annotation
Pangenome analysis	The microbial pangenome is the union of genes in genomes of interest. The microbial core genome is the intersection of genes shared by genomes of interest. Roary is a pipeline to determine genes of the pangenome.
Phylogenetic trees from the core genome	A tree can be generated from a combined set of proteins for better resolution
Bacterial GWAS	Contigency testing for gene presence absence to associate a genotype with a phenotype, similar to GWAS in clinical genetics is possible with bacterial genomes
Wrapup Day 3 and 4	Phage-Derived Protein Induces Increased Platelet Activation and Is Associated with Mortality in Patients with Invasive Pneumococcal Disease

Glossary

The glossary would go here, formatted as:

{:auto_ids}
key word 1
:   explanation 1

key word 2
:   explanation 2

({:auto_ids} is needed at the start so that Jekyll will automatically generate a unique ID for each item to allow other pages to hyperlink to specific glossary entries.) This renders as:

key word 1: explanation 1
key word 2: explanation 2