Molecular Basis of Bacterial Infections course, computer practical: References

Key Points

Introduction
  • After the practicum you will have some familiarity with working on the command line and will have a list of secreted proteins of two different species

Logging onto Cloud
  • You can use one set of log-in credentials for many instances

  • Logging off an instance is not the same as turning off an instance

Introducing the Shell
  • The shell gives you the ability to work more efficiently by using keyboard commands rather than a GUI.

  • Useful commands for navigating your file system include: ls, pwd, and cd.

  • Most commands take options (flags) which begin with a -.

  • Tab completion can reduce errors from mistyping and make work more efficient in the shell.

Navigating Files and Directories
  • The /, ~, and .. characters represent important navigational shortcuts.

  • Hidden files and directories start with . and can be viewed using ls -a.

  • Relative paths specify a location starting from the current location, while absolute paths specify a location from the root of the file system.

Working with Files and Directories
  • You can view file contents using less, cat, head or tail.

  • The commands cp, mv, and mkdir are useful for manipulating existing files and creating new directories.

  • You can view file permissions using ls -a and change permissions using chmod.

  • The history command and the up arrow on your keyboard can be used to repeat recently used commands.

Redirection
  • grep is a powerful search tool with many options for customization.

  • >, >>, and | are different ways of redirecting output.

  • command > file redirects a command’s output to a file.

  • command >> file redirects a command’s output to a file without overwriting the existing contents of the file.

  • command_1 | command_2 redirects the output of the first command as input to the second command.

Trimming
  • Sequencing data needs to be trimmed or corrected before use

  • Spend the time to organize your file system. Your future self will thank you!

Sequence assembly
  • Assembly is a process which aligns and merges fragments from a longer DNA sequence in order to reconstruct the original sequence.

  • k-mers are short fragments of DNA of length k

Annotation
  • Genome annotation includes prediction of protein-coding genes, as well as other functional genome units

  • Annotation starts by identifying open reading frames

  • Predicted protein-coding sequences are further analysed with BLAST

  • Larger DNA sequences or genomes require automated prediction and annotation

Prediction of secreted proteins
  • Secreted proteins are predicted by identifying a signal peptide

  • The presence of a signal peptide does not necessarily mean that the protein is secreted

  • No matter which way you want to move data, it’s easier to start the transfer from your local machine

Wrapup
  • Computational predictions need to be validated by laboratory experiments

References

SPAdes
Reference
PROKKA
Reference
signalP
Reference
PSORTdb
Reference
SKESA
Reference