Introduction Day3

Overview

Teaching: 20 min
Exercises: 30 min
Questions
  • Where does the dataset come from?

  • How to login

  • Where are the files located

Objectives
  • Understand the data

  • Choose login details

  • Familiarize yourself with the environment

Introduction

We will be making use of the Etherpad used in the first two days: http://pad.software-carpentry.org/2018-04-03-Utrecht. The Etherpad contains the link to the shared documents, the IP address of the server, etc.

Dataset

Introduction to the dataset is given here: https://www.dropbox.com/s/hi7ivyml0vle2k1/Bacterial%20GWAS%20DEC%202018.pptx?dl=0 . In total we will be analyzing 62 genomes, of which one is a closed reference genome (OXC141).

How to login

Follow the instructions at https://aschuerch.github.io/Microbial-Genomics-2018/02-intro-cloud/index.html. Use the ip address listed in the Etherpad. We will be using a shared server as the tools used for this part of the course require more computing power and small single Amazon instances will not be sufficient. As a consequence, everyone has their own username/password (see the etherpad).

Where are the files located

In your home folder (~/ , /home/student), you will find the directory "reads" which contains symlinks to the read files used in this study. As assembling of all the genomes in this study would be too time consuming, we will assembling only two genomes per person. We will combined the outputs of each person later on.

Key Points

  • Sequencing S. pneumoniae patient isolates to determine assocations of bacterial genes with disease severity