Day 1

Author

SW

Introduction

Today we will be discussing and performing Genome assembly. Assembly refers to the translation of raw sequencing data into a representation of the source DNA genome. You will learn about the types of data and algorithms you can use to accomplish this, their advantages and disadvantages. Since this is the first day of the workshop, we will also need some time to get comfortable running software on the computing facility.

I have included some questions for the first day. This is not a test! They are just to help us check whether important information is missing and to create opportunities to ask for help.

1. Logging on to the computing environment

Please read through the Kabré user guide. You will need to submit the commands in this tutorial using SLURM scripts.

A SLURM script contains some parameters for the program you wish to run, and can also allocate your “job” to different types of computer. The “partition” (the types of computers available) you choose is important. For most tasks, something like nu-wide would be suitable, since they have moderate RAM size and CPU number. For very computationally difficult tasks such as genome assembly, dribe-long would be more suitable. This is because genome assembly often requires large amounts of memory and high numbers of CPUs to complete in a reasonable time-frame.

Since there are many people in the workshop, who will try to submit jobs at the same time, we are likely to have issues with jobs taking a long time to finish. I will try to provide access to all the data required in the folder /home/taller-2019/Workshop_materials/. I will also try to provide access to the expected results files. I have included URL links within the explanatory text, which links to the primary literature. Reading further might be worth doing, if you are interested in the methodological details.

You can check which jobs are “queued” and which partitions are very busy, with the command:

squeue

To submit your own script. You can write a new SLURM script for a specific program command, like this:

nano example_script.slurm

Here’s an example script, which has some queue parameters (#SBATCH), loading the correct programs (module load), and then the program command itself (srun …):

#!/bin/bash
#SBATCH --job-name=pi_threads
#SBATCH --output=result.txt
#SBATCH --partition=nu-wide
#SBATCH --ntasks=1
#SBATCH --time=00:10:00

module load example_module

srun example_program -p example_parameter

You can then save and close the script with Ctrl + X, followed by Y. Then you can run the script like so:

sbatch example_script.slurm

You may need to find the appropriate module for your command, by running:

module avail

Once you find the correct module for the command you wish to run, you would copy it from this list into your SLURM script, like so:

module load jellyfish/2.3.0

2. Platforms & Formats

The ultimate goal of whole genome sequencing is to reconstruct the genetic state within an organism, exactly as it occurs in vivo. This means the complete uninterrupted base-pair sequence of any chromosomes, including mitochondrial genomes and potentially any endosymbiotic organisms. At the time of writing, the best technology we have can only “read” DNA molecules in continuous sections of varying lengths (and with varying accuracy). These sub-samples of larger genome molecules can be reconstructed to describe the larger sequence, this is called de novo genome assembly.

The data properties of these DNA “sequencing reads” depends on the technology used to generate them. DNA sequencing technology has been subject to many different fundamental technical advances. Broadly speaking, as of 2022, there are three dominant platforms, which differ by three key parameters 1) cost per base-pair 2) error-profile and 3) read-length

Sequencing by synthesis (Illumina)

Single molecule real-time (PacBio)

Nanopore sensing (Oxford Nanopore Technologies)

The field is constantly changing, but in recent years, Illumina paired reads have been considered accurate and cheap, however due to their short lengths, they are more suitable for population-level SNP and small indel variant detection, than genome assembly. PacBio and Nanopore reads in the recent past were much longer than Illumina reads, but had high error rates, and could be expensive. These reads were considered better for genome assembly projects, sometimes in combination with more accurate Illumina reads (a.k.a hybrid assembly). Currently, PacBio and Nanopore reads have improved considerably in quality, however remain expensive options. The combination of long read length and high accuracy is the most appropriate data for genome assembly. Novel DNA preparation techniques such as “haplotagging” are likely to prolong the advantages of Illumina reads, by enabling the detection and long-range linkage of SNPs.

Below you can see an interactive plot of different data sets in the Sequence Read Archive (SRA), highlighting differences in read length produced by the different platforms. A recent review of genome assemblies for Lepidopteran (butterfly/moth) genomes suggested that PacBio HiFi data is currently the best performing approach for assembly. We will come back to this later in the assembly section.

Figure 1: A comparative overview of different DNA sequencing technologies (restricted to Order: Lepidoptera datasets)

Let’s have a look at some public data from the Sequence Read Archive (SRA). We’ll be using E. coli bacteria, because it has a small genome size and is therefore has a lower computational burden, but the general principles we’ll explore can be applied to other organisms (with some caveats).

Typically you could download the reads associated with an SRA accession using the commands below, but it can take some time, so today you can access pre-downloaded data from the path /home/taller-2019/Workshop_materials/.

# Illumina HiseqX
fasterq-dump SRR18106304
# PacBio HiFi
fasterq-dump SRR10971019
# Oxford Nanopore R9
fasterq-dump ERR1674581

These commands should download the “fastq” formatted data from these experiments, if you want to find out more about the experiment you can search the accession code (e.g. SRR18106304) on https://www.ncbi.nlm.nih.gov/.

Q1:

A. What organism are these datasets derived from?

B. What platform was used for each dataset?

C. Which organisations submitted each dataset?

D. Which type of data is most suitable for genome assembly?

E. Which type of data is the cheapest for detecting polymorphisms?

Let’s have a quick look at the different data types using the “head” command. First, we have some Illumina data, by previewing the first 4 lines of the read fastq file:

##Bash/command line code
head -n 4 SRR18106304.fastq

Here we can see single-ended (unpaired) Illumina reads. The first line contains the read ID and the read length. The second line shows the base-pair sequence. The third line and fourth line show the base “quality” information (an estimate of the accuracy)

@SRR18106304.1 1 length=150
GNTGATTGCTGTTTGCCGCTTTATCCCACGCATATTGCTGAACCTGTGGCATCACCGTGGACTTCAACTGCGCTTTTTTCAGCAACGTCATGGTTAAGACGTTTGCCACCGCATCTGCCAGTGGAGACTCCGGTGTATCGAGGATCAGGC
+SRR18106304.1 1 length=150
F#FFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFF

Now let’s have a look at some PacBio HiFi data, by previewing the first 4 lines of the read fastq file:

##Bash/command line code
head -n 4 SRR10971019.fastq

Here we can see much more data, due to the much greater read length.

@SRR10971019.1 1 length=15748
TTGACCGATTGAGGTTTCCTATAGGTATTCATTCAAATATATCTCAGTTAGGAGTACTACTATTGTGAGTAAACGTCGTTATCTTACCGGTAAAGAAGTTCAGGCCATGATGCAGGCGGTTTGTTACGGGGCAACGGGAGCCAGAGATTATTGTCTTATTCTGTTGGCATATCGGCATGGGATGCGTATTAGTGAACTGCTTGATCTGCATTATCAGGACCTTGACCTTAATGAAGGTAGAATAAATATTCGCCGACTGAAGAACGGATTTTTCTACCGTTCACCCGTTACGTTTTGATGAGCGTGAAGCCGTGGAACGCTGGACCCAGGAACGTGCCTAACTGGAAAGGCGCTGACCGGACTGACGCTATATTTATTTCTCGCCGCGGGAGTCGGCTTTCTCGCCAGCAGGCCTATCGCATTATTCGCGATGCCGGTATTGAAAGCTGGAACCGTAACGCAGACTCATCCTCATATGTTAAGGCATGCTTGCGGTTATGAATTGGCGGAGCGTGGTGCAGATACTCGTTTAATTCAGGATTATCTCGGGCATCGAAATATTCGCCATACTGTGCGTTATACCGCCAGTTAATGCTGCTCGTTTTGCCGGATTATGGGAAAGAAATAATCTCATAAACGAAAAATTAAAAAGAGAAGAGGTTTGATTTAACTTATTGATAATAAAGTTAAAAAAACAAATAAATACAAGACAATTGGGGCCAAACTGTCCATATCATAAATAAGTTACGTATTTTTTCTCAAGCATAAAAATATTAAAAAACGACAAAAAGCATCTAACTGTTTGATATGTAAATTATTTCTATTGTAAATTAATTTCACATCACCTCCGCTATATGTAAAGCTAACGTTTCTGTGGCTCGACGCATCTTCCTCATTCTTCTCTCCAAAAACCACCTCATGCAATATAAACATCTATAAATAAAGATAACAATAGAATATTAAGCCAACAAATAAACTGAAAAAGTTTGTCCGCGATGCTTTCCTCTATGAGTCAAAATGGCCCCAAATGTTTCATCTTTTGGGGGAAAACTGTGCAGTGTTGGCAGTCAAACTCGTTTACAAAACAAAGTGTACAGAACGACTGCCCATGTCGATTTAGAAATAGTTTTTTGAAAGGAAAGCAGCATGAAAATTAAAACTCTGGCAATCGTTGTTCTGTCGGCTCTGTCCCTCAGTTCTACAGCGGCTCTGGCCGCTGCCACGACGGTTAATGGTGGGACCGTTCACTTTAAAGGGGAAGTTGTTAACGCCGCTTGCGCAGTTGATGCAGGCTCTGTTGATCAAACCGTTCAGTTAGGACAGGTTCGTACCGCATCGCTGGCACAGGAAGGAGCAACCAGTTCTGCTGTCGGTTTTAACATTCAGCTGAATGATTGCGATACCAATGTTGCATCTAAAGCCGCTGTTGCCTTTTTAGGTACGGCGATTGATGCGGGTCATACCAACGTTCTGGCTCTGCAGAGTTCAGCTGCGGGTAGCGCAACAAACGTTGGTGTGCAGATCCTGGACAGAACGGGTGCTGCGCTGACGCTGGATGGTGCGACATTTAGTTCAGAAACAACCCTGAATAACGGAACCAATACCATTCCGTTCCAGGCGCGTTATTTTGCAACCGGGGCCGCAACCCCGGGTGCTGCTAATGCGGATGCGACCTTCAAGGTTCAGTATCAATAACCTACCCAGGTTCAGGGACGTCATTACGGGCAGGGATGCCCACCCTTGTGCGATAAAAATAACGATGAAAAGGAAGAGATTATTTCTATTAGCGTCGTTGCTGCCAATGTTTGCTCTGGCCGGAAATAAATGGAATACCACGTTGCCCGGCGGAAATATGCAATTTCAGGGCGTCATTATTGCGGAAACTTGCCGGATTGAAGCCGGTGATAACAAATGACGGTCAATATGGGGCAAATCAGCAGTAACCGGTTTCATGCGGTTGGGGAAGATAGCGCACCGGTGCCTTTTGTTATTCATTTACGGGAATGTAGCACGGTGGTGAGTGAACGTGTAGGTGTGGCGTTTCACGGTGTCGCGGATGGTAAAAATCCGGATGTGCTTTCCGTGGGAGAGGGGCCAGGGATAGCCACCAATATTGGCGTAGCGTTGTTTTGATGATGAAGGAAACCTCGTACCGATTAATCGTCCTCCAGCAAACTGGAAACGGCTTTATTCAGGCTCTACTTCGCTACATTTCATCGCCAAATATCGTGCTACCGGGCGTCGGGTTACTGGCGGCATCGCCAATGCCCAGGCCTGGTTCTCTTTAACCTATCAGTAATTGTTCAGCAGATAATGTGATAACAGGAACAGGACAGTGAGTAATAAAAACGTCAATGTAAGGAAATCGCAGGAAATAACATTCTGCTTGCTGGCAGGTATCCTGATGTTCATGGCAATGATGGTTGCCGGACGCGCTGAAGCGGGAGTGGCCTTAGGTGCGACTCGCGTAATTTATCCGGCAGGGCAAAAACAAGAGCAACTTGCCGTGACAAATAATGATGAAAATAGTACCTATTTAATTCAATCATGGGTGGAAAATGCCGATGGTGTAAAGGATGGTCGTTTTATCGTGACGCCTCCTCTGTTTGCGATGAAGGGAAAAAAAGAGAATACCTTACGTATTCTTGATGCAACAAATAACCAATTGCCACAGGACCGGGAAAGTTTATTCTGGATGAACGTTAAAGCGATTCCGTCAATGGATAAATCAAAATTGACTGAGAATACGCTACAGCTCGCAATTATCAGCCGCATTAAACTGTACTATCGCCCGGCTAAATAGCGTTGCCACCCGATCAGGCCGCAGAAAAATTAAGATTTCGTCGTAGCGCGAATTCTCTGACGCTGATTAACCCGACACCCTATTACCTGACGGTAACAGAGTTGAATGCCGGAACCCGGGTTCTTGAAAATGCATTGGTGCCTCCAATGGGCGAAAGCACGGTTAAATTGCCTTCTGATGCAGGAAGCAATATTACTTACCGAACAATAAATGATTATGGCGCACTTACCCCCAAAATGACGGGCGTAATGGAATAACGCAGGGGGAATTTTTCGCCTGAATAAAAAGAATTGACTGCCGGGTGATTTTAAGCCGGAGGAATAATGTCATATCTGAATTTAAGACTTTACCAGCGAAACACACAATGCTTGCATATTCGTAAGCATCGTTTGGCTGGTTTTTTTGTCCGACTCGTTGTCGCCTGTGCTTTTGCCGCACAGGCACCTTTGTCATCTGCCGACCTCTATTTTAATCCGCGCTTTTTAGCGGATGATCCCCAGGCTGTGGCCGATTTATCGCGTTTTGAAAATGGGCAAGAATTACCGCCAGGGACGTATCGCGTCGATATCTATTTGAATAATGGTTATATGGCAACGCGTGATGTCACATTTAATACGGGCGACAGTGAACAAGGGATTGTTTCCCTGCCTGACACGCGCGCAACTCGCCAGTATGGGGCTGAATACGGCTTCTGTCGCCGGTATGAATCTGCTGGCGGATGATGCCTGTGTGCCATTAACCACAATGGTCCAGGACGCTACTGCGCATCTGGATGTTAGGTCAGCAGCGACTGAACCTGACGATCCCTCAGGCATTTATGAGTAATCGCGCGCGTGGTTATATTCCTCCTGAGTTATGGGATCCCGGTATTAATGCCGGATTGCTCAATTATAATTTCAGCGGAAATAGTGTACAGAATCGGATTGGGGGTAACAGCCATTATGCATATTTAAACCTACAGAGTGGGTTAAATATTGGTGCGTGGCGTTTACGCGACAATACCACCTGGAGTTATAACAGTAGCGACAGATCATCAGGTAGCAAAAATAAATGGCAGCATATCAATACCTGGCTTGAGCGAGACATAATACCGTTACGTTCCCGGCTGACGCTGGGTGATGGTTATACTCAGGGCGATATTTTCGATGGTATTAACTTTCGCGGCGCACAATTGGCCTCAGATGACAATATGTTACCCGATAGTCAAAGAGGATTTGCCCCCGGTGATCCACGGTATTGCTCGTGGTACTGCACAGGTCACTATTAAACAAAATGGGTATGACATTTATAATAGTACGGTGCCACCGGGGCCTTTTACCATCAACGATATCTATGCCGCAGGTAATAGTGGTGACTTGCAGGTAACGATTCAAAGAGGCTGACGGCAGCACGCAGATTTTTACCGTACCCTATTCGTCAGTCCCGCTTTTGCAACGTGAAGGGCATACTCGTTATTCCATTACGGCAGGAGAATACCGTAGTGGAAATGCGCAGCAGGAAAAAACCCGCTTTTTCCAGAGTACATTACTCCACGGCCTTCCGGCTGGCTGGACAATATATGGTGGAACGCAACTGGCGGATCGTTATCGTGCTTTTAATTTCGGTATCGGGAAAAACATGGGGGCACTGGGCGCTCTGTCTGTGGATATGACGCAGGCTAATTCCACACTTCCCGATGACAGTCAGCATGACGGACAATCGGTGCGTTTTCTCTATAACAAATCGCTCAATGAATCAGGCACGAATATTCAGTTAGTGGGTTACCGTTATTCGACCAGCGGATATTTTAATTTCGCTGATACAACATACAGTCGAATGAATGGCTACAACATCGAAACACAGGACGGAGTTATTCAGGTTAAGCCGAAATTCACCGACTATTACAACCTCGCTTATAACAAACGCGGGAAATTACAACTCACCGTTACTCAGGCAACTCGGGCGCACATCAACACTGTATTTGAGTGGTAGCCATCAAACTTATTGGGGAACGAGTAATGTCGATGAGCAATTCCAGGCTGGATTAAATACTGCGTTCGAAGATATCAACTGGACGCTCAGCTATAGCCTGACGAAAAACGCCTGGCAAAAAGGACGGGATCAGATGTTAGCGCTTAACGTCAATATTCCTTTCAGCCACTGGCTGCGTTCTGACAGTAAATCTCAGTGGCGACATGCCAGTGCCAGCTACAGCATGTCACACGATCTCAACGGTCGGATGACCAATCTGGCTGGTGTATACGGTACGTTGCTGGAAGACAACAACCTCAGCTATAGCGTGCAAACCGGCTATGCCGGGGGAGGCGATGGAAATAGCGGAAGTACAGGCTACGCCACGCTGAATTATCGCGGTGGTTACGGCAATGCCAATATCGGTTACAGCCATAGCGATGATATTAAGCAGCTCTATTACGGAGTCAGCGGTGGGGTACTGGCTCATGCCAATGGCGTAACGCTGGGGCAGCCGTTAAACGATACGGTGGTGCTTGTTAAAGCGCCTGGCGCAAAAGATGCAAAAGTCGAAAACCAGACGGGGGTGCGTACCGACTGGCGTGGTTATGCCGTGCTGCCTTATGCCACTGAATATCGGGAAAATAGAGTGGCGCTGGATACCAATACCCTGGCTGATAACGTCGATTTAGATAACGCGGTTGCTAACGTTGTTCCCACTCGTGGGGCGATCGTGCGAGCAGAGTTTAAAGCGCGCGTTGGGATAAAACTGCTCATGACGCTGACCCACAATAATAAGCCGCTGCCGTTTGGGGCGATGGTGACATCAGAGAGTAGCCAGAGTAGCGGCATTGTTGCGGATAATGGTCAGGTTTACCTCAGCGGAATGCCTTTAGCGGGAAAAGTTCAGGTGAAATGGGGAGAAGAGGAAAATGCCTCACTGTGTCGCCAATTATCAACTGCCACCAGAGAGTCAGCAGCAGTTATTAACCCAGCTATCAGCTGAATGTCGTTAAGGGGGCGTGATGAGAAACAAACCTTTTTATCTTCTGTGCGCTTTTTTGTGGCTGGCGGTGAGTCACGCTTTGGCTGCGGATAGCACGATTACTATCCGCGGCTATGTCAGGGATAACGGCTGTAGTGTGGCCGCTGAATCAACCAATTTTACTGTTGATCTGATGGAAAACGCGGCGAAGCAATTTAACAACATTGGCGCGACGACTCCTGTTGTTCCAATTTCGTATTTTGCTGTCACCCTGTGGTAATGCCGTTTCTGCCGTAAAGGTTGGGTTTACTGGCGTTGCAGATAGCCACAATGCCAACCTGCTTGCACTTGAAAATACGGTGTCAGCGGCTTCGGGACTGGGAATACAGCTTCTGAATGAGCAGCAAAATCAAATACCCCTTAATGCTCCATCGTCCGCGCTTTCGTGGACGACCCTGACGCCGGGTAAAACCAAATACGCTGAATTTTTACGCCCGGCTAATGGCGACACAGGTGCCTGTCACTGCGGGGCATATCAATGCCACGGCTACCTTCACTCTTGAATATCAGTAACTGGAGATGCTCATGAAATGGTGCAAACGTGGGTATGTATTGGCGGCAATATTGGCGCTCGCAAGGTGCGACGATACAGGCAGCCGATGTCACCATCACGGTGAACGGTAAGGTCGTCGCCAAACCGTGTACGGTTTCCACCACCAATGCCACGGTTGATCTCGGCGATCTTTATTCTTTCAGTCTTATGTCTGCCGGGGCGGCATCGGCCTGGCATGATGTTGCGCTTGAGTTGACTAATTGTCCGGTGGGAACGTCGAGGGTCACTGCCAGCTTCAGCGGGGCAGCCGACAGTACCGGATATTATAAAAACCAGGGGACCGCGCAAAACATCCAGTTAGAGCTACAGGATGACAGTGGCAACACATTGAATACTGGCGCAACCAAAACAGTTCAGGTGGATGATTCCTCACAATCAGCGCACTTCCCGTTACAGGTCAGAGCATTGACAGTAAATGGCGGAGCCACTCAGGGAACCATTCAGGCAGTGATTAGCATCACCTATACCTACAGCTGAACCCGAAGAGATGATTGTAATGAAACGAGTTATTACCCTGTTTGCTGTACTGCTGATGGGCTGGTCGGTAAATGCCTGGTCATTCGCCTGTAAAACCGCCAATGGTACCGCTATCCCTATTGGCGGTGGCAGCGCCAATGTTTATGTAAACCTTGCGCCCGTCGTGAATGTGGGGCAAAACCTGGTCGTGGATCTTTCGACGCAAATCTTTTGCCATAACGATTATCCGGAACCATTACAGACTATGTCACACTGCAACGAGGCTCGGCTTATGGCGGCGTGTTATCTAATTTTTCCGGGACCGTAAAATATAGTGGCAGTAGCTATCCATTTCCTACCACCAGCGAAACGCCGCGCGTTGTTTATAATTCGAGAACGGATAAGCCGTGGCCGGTGGCGCTTTATTTGACGCCTGTGAGCAGTGCGGGCGGGGTGGCGATTAAAGCTGGCTCATTAATTGCCGTGCTTATTTTGCGACAGACCAACAACTATAACAGCGATGATTTCCAGTTTGTGTGGAATATTTACGCCAATAATGATGTGGTGGTGCCTACTGGCGGCTGCGATGTTTCTGCTCGTGATGTCACCGTTACTCTGCCGGACTACCCTGGTTCAGTGCCAATTCCTCTTACCGTTTATTGTGCGAAAAGCCAAAACCTGGGGTATTACCTCTCCGGCACAACCGCAGATGCGGGCAACTCGATTTTCACCAATACCGCGTCGTTTTCACCTGCACAGGGCGTCGGCGTACAGTTGACGCGCAACGGTACGATTATTCCAGCGAATAACACGGTATCGTTAGGAGCAGTAGGGACTTCGGCGGTGAGTCTGGGATTAACGGCAAATTATGCACGTACCGGAGGGCAGGTGACTGCAGGGAATGTGCAATCGATTATTGGCGTGACTTTTGTTTATCAATAAAGAAATCACAGGACATTGCTAATGCTGGTACGCATATTACCTGAAGCTAAAAACCTGCACGTTAGCCCTTTGTAGGCCAGATAAGACGCGTCAGCGTCGCATCTGGCATAAACAAAGCGCACTTTGCTGGTCTGTTCCCCTCACCCTAACCCTCTCCCCGGAGGGGCGAGGGGACTGTCCGGGCACATTTTTAGACTTTGTCATCAGTCTGAGCCTGCCATTGGCAGGCTCTGGTGTCCTTTTACGCTACCATGCTAATAATCAGCACAATAATCAGCCCAACCACGGAGTTGACCAGCTCCAGCAGACCCAGGTTTTCAACGTGTCTTTACTGACAGGTCAAAGTAACCTTTGAACAACCAGAAAGATGCACGTTAATGTGGGTGAGGGTGTTGGAACCCGCAGCCGTCGCCAGTACAGCAGCGCCGGATTCACGCCAACCAGCTGACCAGTTGCTGGATTCTAGGATTGCAGCACTGATAATCCCGGCGGCGGTCATCGCCGAAACGACACCCTGACCCGTCGCCAGACGAATTAGCACAGTGATCAGCCATGCCATGATGTAGGGCGAGATATTGCCGTGGGACATCAACATGCCGATGGTGTCGCCAATGCCGGTGTCGATGATGGTCTGCTTCAGCACGCCACCCGCACCGATGATCAGAATCACCATTGCAATACTCTTCACCGCGCTTTCAAAAGCGTTCATCACCCACTGCATGTCATGACCACGTGCGGTGCCAAGAGTACGAATGCAACCACCATCGCAATAAACATTGCAATCGGCGAGGAACCGATAAAGTTAACACTTCCCAGGCAGGGGTATCTTTTACCAGCCAGATATTGGCGATGGTGGTAGAGATCATAATGATCGCCGGGATCAGCGGCACCAGAATCGAAACGCCGAAAGAGGGCAGATTATTCATATCTACCGGTTGATCTGCTTTCAGGAATGATGGCGTTGGGCGCTCAAGATTGCCGAGGAACTTCGGCAGGATCAGACTCTGCGCAGATTACACTTGGGATCGTCACCAGTACGCCATAGATATAAACCATCCCCATATCCGCGCCATAAGCATTCACCAGCGCCACCGGACCCGGCTGCGGTGGGAACAGTGAATGTGCGGTAGTGGCAGCTGCTACTGCCGGATCGCCAGTTTCAGGAACGGAATTTTAGCTTCGGCGGCAATAACAATAACCAGCGGCGCTAACATGATAAAGGCCACTTCATAGAACATCGCCAGACCAAAAATCAGGCCGATGATAATCACCGACAGCTGTACATAGGCGCAGACCGAGACGCGCCAGCAGCGTATGCGCTATCTGGTGAGCCGCGCCGGAGTCGACCATCAATTTACCGATGGACCGCACCGAACACCACGATGATAGCCAGTTCCCCCAGCGTGTTGTTGCCGAAGCCCGCTTTCATAGTGTGCAGCAGCGACATCAAATCCATGCCCGCCAGCATCCCGACGGACAGCGCCGCCACCAACAAAGCCACCATTGAATTGATTTTGAACTTCAAATTCAGTACCAGCATCAGACCAATGCCGAATACCACCCAGAGAATGTTAAGCACATGCATAACGTTTTACCTTACCTGGTTGAACCGTTGTTATTTGGGCGATATGTTATGTAAATTGGTCAACCATTGTTGCGATGAATGTCACATCCTCTGATCAATAACCATCGATTACCCTTTGCTGCAATTTGCAGCAACAACCATGAGAGTGAAATTCTTGTGATGTGGTTAACCAATTTCAGAATTCGGGTTGACATGTCTTACCAAAAGGTAGAACTTATACGCCATCTCATCCGATGCAACGCCACGGCTGCGGTCTGGTTGTTCATCCGGATACCTAAACAACTCCAGGGTTCCGCGTCTCTTTGCTGTGGAACCCACTATGTGAAAGAGGAAAAATCATGGAACAGACCTGGCGCTGGTACGGCCCAAACGATCCGGTTTCTTTAGCTGATGTCCGTCAGGCGGGCGCAACTGGCGTGGTTACCGCGCTGCACCATATCCCGAACGGCGAAGTATGGTCCGTTGAAGAGATCCTCAAACGCAAGGCGATCATTGAAGACGCAGGCCTGGTGTGGTCTGTCGTAGAAAGCGTGCCAATTCACGAAGATATCAAAACCCACACTGGCAACTATGAGCAGTGGATCGCTAACTATCAGCAGACTCTGCGCAACCTGGCGCAGTGTGGCATTCGCACCGTGTGCTACAACTTCATGCCGGTGCTCGACTGGACCCGTACTGACCTCGAATACGTGCTGCCAGACGGCTCCAAAGCTCTGCGCTTCGACCAGATCGAATCGCTGCATTCGAAATGCATATCCTGAAACGCCCAGGCGCGGAAGCGGATTACACCGAAGAAGAAATTGCTCAGGCAGCTGAACGCTTCGCCACTATGAGCGATGAAGACAAAGCGCGTCTGACCCGTAACATCATTGCTGGTCTTCCGGGCGCGGAAGAAGGCTACACCCTCGACAGTTCCGTAAACACCTGGAGCTGTACAAAGATATCGACAAAGCGAAGCTGCGCGAAAACTTTGCCGTCTTCCTGAAAGCGATTATTCCAGTTGCTGAAGAAGTCGGCGTGCGTATGGCTGTTCACCCGGACGATCCGCCGCGCCCGATCCTCGGCCTGCCGCGCATTGTTTCCACCATTGAGATATGCAGTGGATGGTTGATACCGTAAACAGCATGGCAAACGGTTTTACCATGTGCACCGGTTCCTACGGCGTGCGTGCTGACAACGATCTGGTTGATATGATCAAGCAGTTCGGTCCGCGTATTTACTTCACCCATCTGCGCTCCACCATGCGTGAAGATAACCCGAAAACCTTCCACGAAGCGGCGCACCTGAACGGTGACGTTGATATGTACGAAGTGGTGAAAGCGATTGTTGAAGAAGAACACCGTCGTAAAGCGGAAGGCAAAGAAGACCTGATCCCGATGCGTCCGGACCACGGTCATCAGATGCTGGACGACCTGAAGAAGAAACCAACCCAGGTTACTCCGCAATTGGTCGTCTGAAAGGCCTGGCCGAAGTTCGCGGTGTCGAACTGGCGATCCAGCGCGCTTTCTTTAGCCGTTAATATCCACCGGCATGGCTGCGCGCCGTGCCGGTTCCTTCTTCCTTGCCGTCACTCTTTGAAGACGGATTCTGGAGTTTACGATGACTACTATTGTTGACAGCAATCTGCCGGTTGCCCGCCCGTCATGGGATCATTCTCGTCTGGAATCACGCATTGTGCATCTCGGTTGCGGGGCGTTTCACCGCGCGCACCAGGCGCTGTATACCCATCATCTGCTGGAAAGCACCGACAGCGACTGGGGCATCTGCGAAGTTAACCTGATGCAGGCAACGACCGCGTGCTGATCGAAAACCTGAAAAAACAGCAACTGCTGTACACCGTAGCGGAAAAAGGCGCAGAGAGCACCGAGCTGAAAATTATCGGTTCGATGAAAGAAGCGCTGCATCCGGAAATCGATGGCTGCGAAGGTATTCTCAACGCGATGGCGCGTCCGCAAACGGCGATTGTCTCTCTAACGGTCACGGAAAAAGGCTACTGCGCTGATGCGGCAAGCGGGTCAACTGGATCTCAATAACCCGCCTGATCAAGCACGATCTGGAAAACCCGACTGCCGCCGAAGTCCGCGATTGGTTACATCGTCGAAGTCGGTTGCGTCTGCGTCGTGAAAAAGGGTTGAAAGCGTTTACGGTGATGTCCTGCGATAACGTGCGTGAAAACGGTCATGTGGCAGAAGGTCGCGGTACTGGGGCTGGCTCAGGCGCGTGACCCGCAGCTGGCGGCATGGATTGAAGAAAATGTCACCTTCCCGTGCACCATGGTTGACCGCATCGTTCCGGCGGCGACGCCAGAAACCTTACAGGAAATTGCTGACCAGCTGGGTGTTTACGACCCGTGCGCCATTGCCTGCGAACCGTTCCGTCAGTGGGTGATTGAAGATAACTTTCGTTAATGGTCGCCCGGATTGGGATAAAGTGGGCGCCACAGTTCGTTGCAGACGTTGTGCCGTTCGAAATGATGAAGCTGCGTATGCTGAACCGGCAGCCACTCTTTTCTGGCGTACCTCGGTTACCTCGGCGGCTATGAAACCATTGCCGACACCGTGACTAACCCGGCTTATCGCAAAGCGGCCTTTGCCCTGATGATGCAGGGAACAAGCGCCAACGCTGTCGATGCCGGAAGGTACAGACCTGAACGCCTATGCGACGCTGCTGATCGAGCGTTTCAGCAACCCGTCCTCTGCGTCACCGTACCTGGCAGATTGCGATGGACGGCAGCCAGAAGTTACCGCAGCGTCTGCTGGACCCGGTGCGTCTGCACCTGCAAAACGGCGGCAGCTGGCGTCACCTGGCGCTGGGCGTGGCTGGCTGGATGCGTTACACCCAGGGCGTGGATGAGCAGGGTAATGCCATTGACGTGGTCGACCCGATGCTGGCGGAGTTCCAGAAGATCAACGCGCAGTATCAGGGCGCAGACCGCGTGAAAGCGCTGCTGGGCCTGAGGCGGTATTTTTGCCGATGATCTGCCGCAGAATGCCGACTTTGTTGGCGCAGTGACGGCGGCATATCAGCAGCTGTGCGAACGCGGTGCGCGCGAGTGTGTGGCTGCGCTGTAACTAACTGATTACCCTACAGACTTACTGGTCAATCAAACTGATATTTGGTTGACCAGTTTTCGTTTTTTTTGCCCACCTGTACGTGCCAACTTCCAGTGCTAATGGTATAGTTTGAGATTAACGGGGGCCGTAAAATTGCCCGTTGTAGGCCGGATAAGGGCGTTCACGCCGCATCCGGCAAAAATTTGATTAACCGCACCTAACGGACACAACACCATGAAATCTGCCACCTCTGCGCAAAGACCTTACCAGGAAGTCGGGGCGATGATCCGCGATCTGATCATAAAGACGCCGTACAATCCTGGCGAACGGCTGCCGCCGGAGCGTGAAATTGCAGAAATGCTTGATGTCACGCGGACGGTTGGTACGTGAAGCGCTGATCATGCTGGAGATCAAAGGGCTGGTGGAAGTACGCCGGGGTGCCGGTATCTATGTTCTTGATAACTCAGGCAGCCAGAACACAGACAGTCCGGATGCCAACGTCTGCAACGATGCCGGTCCTTTTGAGCTGTTACAGGCGCGGCAGTTATTGGAGAGCAACATCGCCGAGTTTGCCGCTTTGCAGGCTACCCGCGAAGATATCGTCAAAATGCGTCAGGCATTGCAACTGGAAGAGCGTGAACTTGGCTTCCAGTGCGCCGGGCAGCAGCGAAAGCGGTGACCATGCAGTTCCATCTCGCTATTGCCGAAGCAACGCATAACAGCATGCTGGTGGAGCTGTTCCGTCAGTCCTGGCAGTGGCGGGAAAACAATCCAATGTGGATCCAGTTGCACAGCCATCTGGATGACAGCCTGTATCGCAAAGAGTGGTTGGGCGATCACAAACAGATCCTCGCCGCGTTAATCAAAAAAGATGCCCGAGCGGCGAAGCTGGCAATGTGGCAGCATCTGGAAAACGTTAAGCAACGTCCTGCTGGAATTCTCGAACGTTGACGATATTTATTTTGATGGCTATCTGTTTGATTCATGGCCGCTGGATAAGTCGACGCCTGACTTATTATAATAAGCGCAAGGGGTAAACGTTCCTTGCGCTTTCTTAAATTAAGAAGTCGCAATGAGTATTACTTTGTAAATTGCAGGGTATTGTTTAGCTATCTGTATAACCTGAATGTTAGTACTCATTCTTCCTGGTAGTTATTTACCAATATAATTCCATTCACCATTTTTTAATTCAAACAGTTCTGGAAGAGATGACGGTTGCAGAGTCATACGTTTGAGTGGTGCATTTTCATCACCCGGGATAACTTTATATCCCATTTTTGCATGGATCTCCGCGGTACCGGGATAGGCTTCTATACCAATTATTGGGTCATTACCCAACTCAATATTAGCATTAAGTAATGCGGCAAGGGCGAATCCGGAAGAATTTGCTACCCCTGTTACCGTCATTATCTGATCAATATAGATATCAATCCTTGGCTGCTCATTGGTCCATCTTTTTGCCTGAGAAGCTGGAAGGGTTGAAATATATCCTATCGTGGTTCCGGCATAACCAGGAACATTGCAACCAGGGCCCTAATGTTATTTCATGAACTGTATAGTCCTCAACAGGATCGAAAGGGCTCTTCATAAATAAATTTAGCCGTAAACCTTCTGCTAACTCTGTGGTAGCATGCGCCTTTTTACATATGCCACCATATATTTCGTCTAAAGCTGAGGTGATTTTCAAGTTCCGATAGAGCATACGTCCTGTTGTTGTAATACCATCAGGATTCTCTGCATGCTCCTGTCACATAATTAAATGGCCAAGCTCTTTACTAATATCAAGTTTTCTAATTTCGACAACGCTTTTTCCATCGTAATTATGGTTATTGCCACGAGTAAGAGGCGGAAGATAGATGTTTTCCGAATCCATGCGTGCATATTTTAATTTACGTGTAAACTCACTGTCGACTATTGGGTGATGAAGGAAGTGAGCATCTGTAATAAATTTGACTCTGAAGGGCATTGCTTAATGGCATAGACATAATAGTTACCTTTTATGAGTATTTCACTGATGTTTAGAAAAATAGATAAATTTTTCTGTAGTAAAAAGAGAAGTAAACAAATGACATGCATGTTTCTGTTGTAGTGATATCAACTCTACACGGTGATATTAAAGGGTAGGAAACACTCTAAAGTATCAAAAAACGCTCATTTAAAATTATTTGCATGCAATTTAAAAGCATATCTTATTACTAATTGGAATTTGATGTTAGCTATATTGAGGTCTATATTAATAATGCCTGTGAATGGTATTTTTGATGTATTTGATATGTTATCAATTTATATTATTTACAAACTAATTGTTTCAAATAATACATGGCTGATTATGCGGAAATAAATAATTTCCCACCGGAATTAAGTAGCAGCGGTGACAAGTATTTTCATCTACGTAACTATTCGGAATATTCAGAATATACTAGCGGTTTTTTTTGAGTTTGATGATTTTTATCAAATCATGACTTTTTTCCTGAAAAGGTCAGTAGATATTCATAGGCAAGTAAGGTTTTATACTTTGCTGACAGGATTCAGGCCTGTCTCAGACTGACATGGATGTAATGAACAAAAGGGAATGGCTATGGAAAATGAGCATCAATACAGTGGTGCCCGGTGTTCAGGGCAAGCCGCATATGTTGCTAAACGTCAGGAGTGCGCAAAATGATGCGACAATCACTTCAGGCTGTTTTTACCTGAAATTTCAGGCAATAAAACGTCATCGCTGCGTAAATCGGTATGCAGCGATCTCCTCACTCTTTTTAATTCTCCTCATTCGGCATTACCGTCATTACTTGTTTTCAGGCATGCCCGAGTGGCAAGTGCATAACCCGAGCGACAAACATCTCCAGAGCTGGTATTGCCGTCAGCTGCGTTCTGCACTGTTATTTCATGAGCCACGGATTGCGCGTTGCAGGTTAATTTGAAAGAAGCGTACTGCCACACTCTGGCGATCAGCCTTGAGATAATGCTGTACCACGACGATGAACCGCTAACGTTTGATCTGGTCTGGGATAATGGCGGTTGGCGCAGTGCGACGCTGGAGAATGTCAGCTAAGCACAATCTCCAGCTCGCGCAGTTCACGCCAGAAGCGTTCGGCAACGGGATTCATGCGGGTATTCATCCGGTATGCATAGGCCTGAATCGGGATCACCAGTTCGTCCCGATTCAGTACAACGAGCTTCCCGCTGCGAATTTCTTGTTGGATGGCGTACTCCGGCAGCCAGGCAATCCCACAGCCGTCCGAGGGCAACCTGCTTTAAAAGCTCGCTCATCGAAGAGACAAAAAAGGTGCTGAAACTTAACTCACTGTGGCGCGTCAGGGTGCGATTAATCAATCGCCCCATGTAGGAGTTGCGGCTGTAA
+SRR10971019.1 1 length=15748
IhQ8XkfjXtShUt5~~YuhhjfX~jUL~lhN~NB}~igQgjRQ_hd[~PX~geeighhhhV[x^Q`S^SN~~khlGY;mLjeOzhWp2rN2z}NZqf[~lf3w>rgfRiWhleY~kWu<~~]7zfj@~k~iF}hBuk`)W[VcReU<miX~iniL|hD~N`S9:+yVbiWck[zUak?~W;hjkhkUM~fa`h8}JfhAG~`i`KegSHZ~iikfY~\Gr0zQj[wM~Z~eTY~Y~k_fF|j-eyHaR~kiG{hVOZiG~fZ~jZ~U$pyqIJOh\~h\~lb>xxbIoGZa(lXZ_igUYJYDUiY~iZ~heEj4~dii`6wgC~~cV~Z~lgkh&]jT~jcI|K~~V~QiKIJUWpR~M5diVihX]hc=Dv~i;}cJgiiG{ik?~~hekL9n[4o~TeiSAY8eldIzF}hUklOjSJ~U\~iSQhkUh\~X~jhUti&~OMTQT}1~BsgkI~jijQfjglUclY~klfjgYHVxW~C{SUiiK5~PiR|8~h[TG~]~X~kX~SOhQ_BuPNjQOiXhgQkY5~~Z}^~leV~OIiiiijmC~}WblkiN~~jgN~hjFzcjhhTgfTQa8~heiZ~iH|^cF~VpJAhcUh=jb4~~{g\~W{i[~UhE~sF~~dO~~jK~jgLiSjE~~hi0~~~q[h-~~~zNhe7~fg/b*GgaiQ~~W~iZ~hZ~iYlJ~X;t|d>\5z~~~~xiL~~j9sfPP_X~fjfZ~Ly9z{{Yo?~{MihlI}djhlmfPN~~XI~_B~h[aSg+{clx~ijkW~eGSX3}~~s[iP}9~~~~~hiQT<~~~~flfWhRI}ggb5~|bibg?SB3}~Ohg2}~jcgVfSPN~~M~X~A~qQc[7jbMQ}QZ~ijPhjgBb^2]ogjhY~iQGuueSceGzgJjg`MhTbliK~G}ikf\~L[uMghiHY8{~~hF{STmjkfgUiIrjhkB{~ifjgH]cEdxG:nzfijCmh.bXffRxPh\~D~fEpU~jN~~XJ~xMfi6~~~v`:~x[TItbXbhL]i6}nG|iiVgiihhlR.~~~eF{<l}^>~|HX?s~lgXiCz~`Buy~pD~~nKbS`KhT\^MD~IyWaelU8~uhiMH1}~if5~~tQ@~~MbhlUQ`H8{KhaFaT/wwb<B\LhjQ~~ffB~~jfa'yslj{cN~s3}8|d.eePf^ki:~~v]g%xz~djiiX~kX~lkdI~bJ~iiimL[~h_;fil=}}kgQ_2~igDhefk[~c1ibI}BGCf%OZ}fkiVVAx3~Ktj8liNu~gYmWM~Z`A>nnI}qC~|~EzaDzRC~P|ZO6kOYG{eQikfPE~fUhjT],`MHg_gZyikWiE~nJqcD~UagO~fYrT[dVfC~k_PfPif_fkTihO,PGR_aK{Y~V~iMfG~Vpd^+~igiiUglkV~<~seVwkf\~jegkhi;~hiiO|[P[ikiXvJyX`%^(B:jihM~~MI{gPM^I`G2v6z~~~fY~hhk[~RiiGwTVUcmM~gjUcjVX~G~icC~NfX{P=QcUkfJj[5~kefPZSjImiRePjijM~hA~~heBdR^d]_:L\bY_;,]Y~dQ_OP~j-sNWUgMTfjILIjjiVcP~UjQ|LKkQUQaD~~iaHsi_^<fhjK{By{fSMxRIwjF~H~[}W~jh\~fO~LEMH}EveY~h^jQB~T9f{|SfE~U{3nhm]~iVD|;}}}N}gkeZ4UhVX~iYkDd9hgmhUStH_gI~SrNlMedXUagNxjDPH~i^EtjQBuT~leO~x[d_MkfOpViNh~jd=||i]QCyk_Nw~B_\>QkiiP1{~xvjO|ijjii8~~~YqYwegfeMrh>~~TcM[}fgUeXkdC~dPf_ZrW~L[>~~eP=hfX~FvW~9~~jN~~iY~Z~jgL~WkdC~U@}}S~kZl)~}jg[SkX~?~~kg>upb_XObXhgZ~flIJ%~~aOXfZ~Z~WZ~iFyG5NSmN`Uj%~^L~~])hjPsUkX~jhi?~~mO>~rckehleXkVlD{+AD~~[fijkT~H~En}vN|gR>ffhiOUT{M{@N9e>~~ycK~U\~mfA|pViF~jOngbRfflgjUoPQ}ghRAL]Q~jNM]hfTgSPIPpleR~~lgjQ~OagjiQX~jiX~kA~~~rQLnI}ih^UZhG~{L}hf>||SfhB{~}1deO~~UjbhVrLY`IgjUZoI{LhmhfkdMwb'qfddTLTUiiG~VpO~~[|jiPSh\~iSN~J~kkgZOrjL{ehg5~~fgX~7~~eX~TC~~h\~jSC{KhRVgh[~kilRgiUH~~lhlhiIp>~~jgkkhjiSgdI~G~rk[lTM~sD~jhhC~kH~STlkhEcLxN]N~pfV~1zbSz<~ijiA~uW{X~hgWkehkH~W~QY~aI6f@eVjX~cWeTLYI`ZQV~:}edV~UieedRdJ_Xsj4vx~thhlkX~haSHwV~O~~kThN_V~?~oPY{kfK~jh[iY~iigJuh`U~XglY~giYigJ~VciZ~kW~N]iMMyC~d]~Xl'WhgiHT>FzMNCl~hXBPr@~Y~S9^FSiRjgiPhhaO;K0}~gaO`Y~ldAqph7~~~~dU~heflBmgH~cJxfeTjjM~~jS~iiiLT.~~~laekUW~hg(||H}*vkX~llfi>weOAk.ex~eQ>=LRiV~Wgh?~~VuUiY~lkf?~~~glkjjicQf.wQZp[i\Q1~|[kgUiiG~C~j,z~~rg_WWWNmjgX~[~Vihhh\~iF~djijkX~jN~~jX~V~F~G~dIxff^E{UIx.zzNsvbC~w`MzRhX~iL]7~dc>{D~~fkijI~I}hlMW~gXnF38TyRTE}~~Z|SkhgiSf[~jhiiP^fkeeUXU<`W~K}ROkefY~j5G[wN~~jiikVghMkTh7klQg9eJ~V-iRjhSqZC~SJlaa@WffD~G~hSKe7|~~p[~UmfOE~~UglkgZg\SIS_7e=uijjiiOQgigiU5tTZQ~~iVgN:rmSgIp>UzgiUf<jF9~eSagQD~bYyL]Y~G}ZzO~~>~k5xNXpi5bv~ijmfZ~X~fVRyjGwW~hMmqkiA~~e9_dZ~Kt6jL8~QEXL~fgijkilf>VJ}djF~`eKvhf9wcYaiWgPG~jN~~iji;ygeFfQG?LTNo_Bmx~dB~xpiijkP~~QhkK~c4lKxjXzkileB~~~|Z~9~~~uaPW~gi\~j:~}g[L?lJvNANii8S>~~giV;~~{Gs;>nFrhI~H|lHpdTfWSjhjkhiZ~Q~~WvfWf3~~hZ~dfki?~}VcYSiQ|ijP[~i<?igK~khkC~f`Ikka4~|NtgfS_3~~~~|dNk3L0+CJ<Y[~ilkiG}M_`\_>~~xRQ|ilgkeY~jSZ|C~~flVcliYLH}Qf@vXNc\:`uzZukZ~hkii2~~~~fZlL~iVhhU7srbRWtiKc_LaGygO1~a;lOK``4wp~c/~~~jH~sCEcNY{]~^1pfZ~d=~pUiWYglkikW[RReB:UkgUQ~~i\~jZ~iUzB~VcibF~gScKhhSQSUiilTRg[EixL}khk0~_Qha:F[gQY~fI~=|zWX~d%~SP~~giI~cUVgTe`ji]TIQ~gj]OZ~dfhUN>z~~PfiL~jYkX~RK~NfilPiP|X{igA]N~lii_hgX~QF~SLZii_AmRR@^aeK~c[|Y~Z~fkX~eF|iEkbV~jjihg_fcS=fcO5\RI|gTc@u0MllaKgjQPiRkjgi@o?^G_hjjgGBo~jkeV~Ta9~}bkiSekYijgdF^bdVZPGqH~jUh\~J}iUcR_UV?xhi0~wiWDrrYtSgK~FoL]]~NhUH~WNcUV|J~d<H{Q~~UahLX~5~~kTcQQfgfdgJzmSFaTHsAt~~~jK~ede=ue[~gj^jTjg?~bB~wW~jfiPQ+8G8myGrFhdgNVwW~igjL^Hwjc?v~gihjRcPS~Pg[{SXxTFiRF8~eh8|ehfXeYceP^NMS=gfkldU~jUg<2q~{mVH{~fI~WJdjRjgkDHg<XMsgZ~hUtigQjhgNije[Ul7g\~bC~h]dC~F}xW~RfiUa]hS<kV@EihWuO~gkfj_hd;yyjic7f:w}rQQk[LzehZtQd4=klPh^TfiSifkY~I~Uc*uThPQ`USTeQehggdJvPCq\gVPgeli=~~ehV~VP~~i3}}}dS[@QUkY~fjSkYUK~ZbkRdd5uPhhgiXagfNwMSPeiU]~:~o`'z|h]F~a(Tiijjf?~~giH~jbeje\Upci6qQZ{>~~iE~4|~fPY|fkV:qeikiiiiigfIKnhSST~jT~jfddCvNQihK~ejQPqRX~jji%djM~d5gVrRfijjZxS`eUOhildNT7{~~~g\~`SUP~~hU9~jimQcijO~~hO7}~thkX~jfdSI~N~~melfghSRG~i5~H}cG~hiCk6VXnSOZx>eDY_SgZRP{5~~fO=QEI/ePSy-~~~~qPu~hh<~~~qTQ\gf^SgZeO~UdlL~cXX~[|SlL{X~RcG|RfX~h`W~jgkgiNxTReL~fikW~ha8zjY|ikkbA~gjkZYdO.{~}SoE~~kZqZgkiAjX/|}}{fej@~ii~i`MZ=tvlKgQQejiIfheFyjjgdSUhikdV~igX}XeC_ddffM~O~~ijiijjffljfhlfhiji5{id8~QjNc`Sje-j{phiddMeCrhLzojkikiSU~ii\~kkdV~UIQgY~jgO~kcJT~fehKmaC~iX~I6{U]okjiPzeLjZ~ijg5ennV{Q~~kgjThijgd3\cRjgjgelliK~iiY~LM{E>ejW~jePSf3~|N_i_V~UhX~S]7~gNukgU~=hC~eA[cEv~^ukf\~hS`hg]~eeW~Y~H]ihY}gO8~eH{mOKi?kjFw|[~hjX~gQaMR|hE~hhije&^eS~iiT3}cQ8ifjgkTDsJSMM]TgJl}gRdgE~eeOHsTbjM~~g4ug\yH~~m>~cfhejX~ihkShhHEQ*eHjGyFoSV|hiX~iM~A~~ThhghZcE~ShX~eUjgkTR}ggX~jihOhkd_hhelfgI~biYhP.|~~lLhZ~WXuW8~~~fAy=`P~~ijkchg?K%@geLSOZ~Mb2QXTU~kg\~X~?w~kffZ~fgg0_ScUiMG~NbRehcJdN~~kiif\]TGmjMViekiIkP`gTO~efhhijdWZRjgXhg]f]iTPg9jW~jBof6ZtikiiURZ~jjeX~kgYz]gSghbSY~jOQaB~eOdG~RnZejS~\ClW~ildhiSjhfgkfVig8{~Y}FlOhQN)Ed?~~~~hV~QhihX|B~xjS9i7cNhNcNTOU~ihYkgZ~fiQRbOJOJfEFi:PVViLt[~hjX~jI~hiD|T~jglSTyCmhfSeYqfjSekaQdTijg[~W}[VKdiihgiYygiD{dO[TQ\hFtO:y}uRbMbT~JhhRejUdHyAHgccj*c8[L`Gu~~jQeZ~b3w&ddhRilcjSzPQ~eUh@hXOo(kkfjSU~>6uiHP2{~|eeU^h9~~iejki3~~|SpaNik9~~~qiZSgjPPdeIegYp7O]1m@|cdRGmKCSffh[~SwWgh7sZJbTZ~lelkB~u8z~}ieeOXiL~SfdV<qj?fJxWoPiMwygW~k_=TiM~jgk8hi(zzRaiWRrjcjCkI~dijX~idD~c@h@|TgcikJ[,y{tkcMVkhfcQM[DW_]e^=_w?~~bjRhQR`J~LrlCf>bv~fih<fiRhiUNgfSeiMlWaLTqjX~jIjM\~i[eW\~fK~hC{~~lLhMOchiOO`kkbhgehejUgY|PKV/:A\PW~9\3~QFpSiDzUiFUZV~WTRR~;~zZEl[VMNjX~Y~ii[v;ingePMrtF~|h];|kfU~fR>ofd.zghiYVredDzG~~~ii(ijkfhcBCOX0hM~VcJmMkkW~gM_Z~f\}dehehdgGN5`O8`PNZ~g[~IlA{}cfihMkldfigiGxT_S1`?|L~&zyf`jgiiTeRSd1uyc>olBu5|z~~gjk4_MQbdSQhe*~~~~WSeKvhSW~kW~igj[ITchSh7z_GygKNjW~RUeOjfjiiD~UgfXCS~hQUig\eb_Mfa?ye:jUqfN~f?GZfeihfIzY~RV[i\~kkWqJ{X~@stsaRbKDxQhPefh@gG~K~tt`]Z<na]=qZbPq=p^?kD?n_ZTtOvb]b]^T]WD^Cm^[Pl[Ri?]%21fha]aI:nrkZW_[NLY7nnZHGBmIIp\IUeD?`jI__@m\bFtrEQ&T>]T3opJ^_GtaT,~T?J[2`H@:ZGDId`]0f;OC\^]9DlYIV^St]9tusa\bHoGE^LTV`:lWMn_Dpv^_^FuuPtaZbZZaRtb\]>q`]\DaYD_1oonMaFuua[8n`fStOt`^`a?nWbb]b<o]`]_>utb[VOtJ_]_HUj^H[TK1f/nh3*XVQQbEsraD=7_`]@b-npmE8^Z(dkOu_G3pPJjb\FF%RUH\FITq^]cKSCDSb+qrd[:a[``Msb]?kQXOva^\RpTt_WF^G@{YPt`]^XBXaOt_Y1W>IKGP>\GY`^=su`Ot]IX=qp_\]9op^[_^`\Tt6qbOvb,ha\UlJvb]b`b]bMt%;LFJRJ]F]aHaXPvbY[Sp]^`]bKFFg>Fb\aNbWH>mVPbaAoGsJG\cb\9S3_oPm]`^_\`;n?sqD[XCRGBFKd`^SpY];nPo]\E)LGOt`F]L1$mm[Pa`@tta[Yc`CmI_N__GHUq@vue`<sIVbbCsSr^OvbZ\J_`YDsY`G_Rt]HU?~XG`aCqTt]cQn>R;BaqPta]cb^]GvvbKH^[^OmU\`FubXE_(j[K(9ZTq]H]G[aJRbNtI`I>r]^.oppa<h@>ruuZ?h[JE_8uus^WMSoYYThWP]YK][TZ>t]`^^aY[TNtC8iVAXYTt]Pta\_^Ovb]bLdE^/pqoJUYUtKVLu\=oJ`]ITsRp``F\>rGaYZa]IV^Tt&hHEPsKRLJlLaYZHDaZ>^Z\bXRL:pq`OuQMq+Z>nW]`KVDup-qNmZGtb[=qaYY\I]Rs\\bZb[YR\V\`YOp_\aYZ_HQ>f<`o]BuZ]Z^`]IStUJAo`]8qhL[G)Aj\Vm\Ipp`[1l_\`\L_\_^^`O]]\6ppI^Ovb<Kua4ulZ5Qp^NuM4STtG]@m^]a3rrhBj]Qn?UaPqOF.fUJ1]K;gmLJQp=pa@PS2W`XZ`G:^Kp^W2sjZ_]bBulQpCl_b]IqqUJB\a^PtaVO>suu`4rrmSmMBr`DZ]Ov^aE?rtaF]`_aFumG`<qtm]SqZa?r_]]BQ[bRrNq-m>dGCs\aYZ^_GI^VGLX^EHZI=An]]]MuF2`Ov`Fr[CEobOtFF[YD]\b`_AY0orrqSpFuh]QaXO4bnO,HU[Y[,laJGJKH]N[bSpZ.qq?``JPl4AKGDYI+jqW]3W]\]GV3nU>ep\aInLNA]]ZOfMQtIaLrZ@mIVA_AVPrN>ra]F8^d\@fgZI_]MmTG[Y\ZbYMOYa>ugQ0uspV6k_C>Ts/nQDZROv`ab\TtNqFm?Rp[V]aSt\8V\ZG\[PGTZ[?fKtaOu__\aOtaXB`]^U^]<ttEmHV?ssZaZbLi,m`\?orJ`]SpMrGOn\KJZPJHNQ:GG3Qp_\`[>pHOu`^_bS[B>0hjEJI^`aQZK]9,32M=dV=~\_``^]AmQmFX`\Kqq^OvUtKV[\HRpNtUrSqW^Re\UqV-roIPlSB)[]*qsqZRh3qtn@i]AuvtaADc3Da```AmDpO99;\Lg]^6X]\IaHvv];P:G,[//YaYKWGdNtaX;b]]\cb\;ttqYAI^^^bWWYCo[2VcbPvLY]^a[VCiD``\a]F4B?Ib\\\]^Rt^EoSpYVb]PtbIjDU_@tC@\JVArR)\O.[JGaHGefH_FtaQwbKR[D]Yc`[8p]2839C^OvM<ghBs^YIa[b]aJRWMqK3maHUOebZFX`]`HDtuPt`]]IaNubb]]3m[SjMub]a]?>3encX?rt\bbNta>sjZ6pmGa[aYMv^aZFqYI`Ot`^fSHeVSUTH$RQEIO9V@T>aQTB1dgg[HcSUM9MTHfRG=`^:daSV@FeJcQRTCGfTQUTVUWFQP<RV?SI5UUSFgVRU4fV:<aeQTRTRT6e_RN?:^KMTSCb/aa]UVRBRcTHe:`CHTU<b[YHe;9ce_TFN%OPWTO;KV:X1``ANUQ.bbb_RRTT-G]QV@NWUSRU=Q?NP8=BT>aSJfCfVQHeB@SD3VUSWI_2eeaSURTTSJcQUS>RHfLEbVVQRVRVGfUHfVVQT@ccFeJaRU8N<2CCSTEEPRRVHbQQBMRN$Z0KH]9f_XM6gQSQ>Q'+^bUSP>TVQFfWV@ffQVHfKbBffTIfUGfHbQR(BTRTVT>K%R:`?PO>P@ccRTU@eVUTJfC];fBXcJUQRJcSUWTJcQQVG%5QVSLDN:SFd@IfVRUTG`5a39N4@RM*?\7MJeTUSHfT1\UQQFfTGe@VP9TRVSTTUHfVAdbHgVIgVF]WU>VOD>G3>JeUTUVRBccUTTBccSXVTFbL:TUTIe:eA;ANUPSUTTVVNR8aQSSGQRUTTUSUM:NcWSSRTUS9eP9_SQ1``RHOVT3b?MUTJcTTP6`LSQVT76:_VTJcJeUTWS<UTTTUFeV+ETWJfVO3VHOF0SRBccSBNIbTTUTTVR@P@MSD*8GQ9bRA7_WF<RTI`VRLb>W>T+aYU;fffRVSEb:QVU?AbcRT@SVRUTV?ORTP:\NUSUTHDfUTFQ$[O?IVSVS8gUPDFcD_PHcQVVTUGfU@gfURGe?AGgVVHfT=SFe.LE`UU;+RYQ@T5`+GL.O7_^?FeG)6TaUUSVU)cc[T@_QRJcRDQRUIeFfVTUTH]B6cR7(RQS<VQ;HaVS&KHTIcAffTVVO%KGeM9BSQ<GcVVT-eZUTJcTAffQS?ffU=QUKfSIdVSUSWTTSIc:^<`TTVUST;BbfVQFf@JTTSSHfVSF>'``<GHUT;_QTJfTIc@T5`7bRJfVHXTQGd?BVPTSO)RSRWSVQRT:eSVRT<^>dc@VVSVVPE_QQV;JSJcQUSQTPUS>fe:YQU5`Y[PCRX5^'4P2UC7DOQJRKfUAHRQRVTFQ@ITGfK*_QHgURTVHfQ@fO$iOOPUTIfTTQ?U5cR.NUFfVQRMCT>SQPSTIa%[TVVTHcP=%UaVRFf@aUHfIc3deaRRUFfVHfVHfS7gW*bN<_U:aGbPQS2YSR%OHfVRUUUU:f_FfHbRTFfVRURS9gRQWVTGbQRK0D5fffeVVQFfJcTTVTUUH]SGRJaTUIRQUSTV@HRUO$;E8AMPUH`<KLCOA@@=AKQ>LB/@LDOCUNLBHTTSHfUTSRJcTUTJMEeSQWLS2B3?QT+>7ceSKdTTT*QTJcTM7@_TIfVREa>RISASTURRJcSQK`'\\\O3B=A?@(==-MA2cSHfRAdbTT:efVRLJMSNUTVR;URRVTUVRVV>gfVJcQ8HBccT;aNAURV?ddTUU<XSMBPOM%QM;NH;PGeU<f\QIb?J`Q9ePIfJfTT<ffW=BbTJfS6gQ:eVSORSHbQRVRVRB@EH`GfRSJcTIfUP7_M7^_?QS:HdUT@c.G<URVRUTUAC5TOP)c_QSIc+XP9`RG`>`TIfIcO:eOEaS,ba>[fVTTUSRS9cSUSVAffL\FTWVB`JcQG[N:bPVUTUTIfRLWJPTRV:_TLO?TMTEfWEfHbRVVTTJeSAccBffTUSTV4X<dbTVQRVGfUGfHSQUTSRSQUT;cOKfUIfTUTTP>P0D=U7a8P%ECffVSRIeKfW@ffJfT@OJTSW=IeS;9*dfaEd5HQB[TJf@USVTJcQWTUVTWJcTTVTVGfVTJcRUHfUUTV>\UBSHfE`R@eVRVJcHfTUSIcT4^F56]R@B2BAdcKfHbSVSWNSR9e^TWSTUHfHdAbcRTTSUSWRAffQSFf8ffffVVRT/XEdVPRSIcSCgU>TSHfVRUHf*_04fVAST<GcGa5dcV1bYJ&:RTTVTWJcSWVSGfV4eeVSVGfTTHfVQA/5/_M9_SVTTSTBNAFAUA:BacTIfU5bFS7bQVAUHeW=bSIfTDcQUQTXIcUV=gdUTT<^GgVTTVVRE]QHfQUTUUP4cIcR<ETNT8_VTUTWVSVRQAffQXRUTJcIeKfVL>QD`RTUSVV;ffcAc^L<LSS;eS<`RTSUTR9TQSUHfMQ)MVPFdTSUUVQSWP:TTUS:I@QVGfIcSHfVSBOOBRUHfVRKfTU=MIbSURBP'@GC>QSJfVRUTH<C?IRTT?QN8QHfTBdcSQSTN:P9`TQ8IRUSVSQ@TST7P7JBTGe;T/G=eZPFUUSKG8TJfVTTINHQUVVTHZ$PS:9JVRGfWS,dWUTVRUSW5PAL'YVUS9caP<VS=?Ge6pQVHfUKfSSTIcTHfQDaS5ffEb,SUVQFfVPPFST9gQPTJfVT8`OS<@TQRNGTSSTHfRTS4ffQVTU=QRSTS?G_QN:`RQVVRGe@TQEeV=Je2U1e[VTU4_F[:6`FeUTSVRAccLMTH%FQKfJcSV9gbURJ`9@bSRS9JUSU@ffQTU=SVTUU;gfRT=GeRUSSU=U<9VdQ@BffTJcSWUJf<^L?A`fHUTT7iSKfCaQO?^HSSTHfRHfQWU0]ASQ@VSVSUHfWSSJNVR2bXHfM=O?BJbTJcTVTBccTT<IbUVHfIdSS7UQU=UGLZR5cc/LM5VRJdP,HJTRN<G?QUHfTC6K-^ESCM9\QV3cdPLP/JTHfS6gW>GY4efcSHPQURSTM9C`8dGYIaTSV=`KDB?VRBSUSTVVGeTT9,DRB]*^LOASTTU<UGfRVQQKfVHfVHQSJJV@Bd`:89dVR@cbQVTAO@RT=AG:A\QUTVSTDGeRTUDY,\ZT5cg\5B8b-QRVTGgQU%\USVRIcSTF`??YUTUVSJfTT$8MRSSVT0WQUGaGT2ggQVTTJfSJfTFPQHfRIeTRJcSV?C?:X`H=HWGd5K24McQ:]Q?GcST@VAd\<TUTVSW8a4ZTJbRUIaB?NWVPDP:H;RCHQUTTIc0DB_Q,bS2b^=F:N@]bQDT:dSTR3^SVIdJV6`SVSWUTR0HGEcJcOHcJcTH\@;SVPVHNURV?SIfTS>dVTTBIbR@NC::T,c]UBffRRJcTH]5IUSVDVRJcIgVRUEdLRM<QURIaRS?6VGfGeIcJfU6ZIc=WPJcSWVRTUU8U]T4M;TUHeUKfUSCfSO4cVQGTTSROU?RTTSBZRJfTUVQRR6gVUUT9H3\B\,@dcTBccSWURT@a\@VVR0_UUT<QUSDdIfUU?TTVRMaTUTVQTF<=+K9eOV<fffVN'UaVRJcTVTVTVRHbQFe<SSSTSUTOBbcQVVQ<TNOTSFS4f_PCKI^TUUP;UTM9Q/ccaVQU;?QVTGgQKf1LIcST@TT/;FfE:cSTTKdTVQDP>I,EWVT;ff];LST4ffffbUP8VGfTSTWSTVSVRJcTVTQBHf7fffY3_UTVOBONR/QRJcTSSUST0cc`JfRDTJfCgVTTUT1g_QHfQVS=%4VTVJcHf>f]AVTTQ9aTSUVTFeHfVSKfUUVGfUTVTUTHfVTSCVJcTV@fcNAaVTTGe>RUUTTTTEfVHfV?FT=P+ZL;:4=FTSTS5K'2H<QTO?5ZUGfQV&_><UAOBRD`?AS<VGb;9N4_\?+WRRIKNGfRU?TTS(SSHf;beH.Q`Q@RST&RSJcT&dDV3MRPS@L]AO%mQVPVVSV?SHfF7,&F?K?VRDTITVQR;SUT'fffS3cbIeTAffRVP7`eSUGeP?@0CE6ORS<STUHfUSUTVQ><.Ob]UFfB6MLBP9_O):G]?]G@E?5V50O95`dP5SGe>SVSFd7>R<OTTBdcSBMTUSHfVGQHPTD`@9i>8S?<]]]P>S?M6`FU0aaJ;=URGP@N;J,jN@7`SEQSVP:cHW6cVHfVTF%MIaQQAceHcJeN8KH[1ccGe@S1I?9PKQ9T@e`SM(_`AUTR3aaJ<KST8UPG`TJ_6@VR3`3QM4~8aSV@MPQ0GLO@TMaTGb:@U8bR$mVVSKfHfTHfWVTAbPHdT9e>dfUU4ccQU=`JNO(>PMIPKPRN7a;=L:TUN3tON@Jb::dUT>gfUTTUTD];TVSVSVSSKTTT7c$.=^K@IHbRTT=4beeTSGe>PWSIcUVG\:gAKbT?HfVHfUTSP><e^IcS6d;7`SN<LIbJ>JM9R9V=b`?I+H\@VVSD1_`RVH`GP7eeQ0aaSTTVTTRJVP&JBHZPEbQ@T9`:aUTS8CU;S@TT7`HdDaFfVSVPDQGPST7cSTIcTSUTVTUOR>RTJSTT<URUPWN,bbVPNHGcASbFV&UTOSU7K?VRJcSV@JK-F`LIQS>YNVQIA)^TTI]S?=JcQQ6cJ+_SHPSQ@$RJL;CHWSDdQ0aaIdQ?@RWUSSLQJb<JV3WOBE8C78ZWQRS<@a<RWU?HcC9YT>;Q:e_3MM%U2O9\6O8E@G>UC,^SVR?2_P?ffVSUDeT41SH<96Q^U8dTTEbQ;cE:(;NG_WVS@8bX@EDF1NDfVHe?N/Y9_NQ9]:S(Q7gO>TS;MQU.VVS?ffVTVP:TJcR7AM)0W`QU>O)FHF?dZ:RSS>7cT4^>@0affcQKeTTAR@;PPKJ_SBJTHfVT(_STT)aVI'_B^VTVQK<KTJGXT3\/,OLJTRQ>LQSAPM(S=IdUTU4_PTU>UTSDR9L>OER=[;*H<OL?G>C[9R4^A?TT9QQ4V-J;NO:M9IdTTO:eUVHfVS3fO7QTTUS6edFeIaTTDSLN4ddfVR(ccccb\US@aMRK`SQNSVRP>9W.UTIW)WLI<KTR:V>)Z>?URO7e^TSRTKfHfU1ddd[CTK>.`bbL`TAdcO7hOURFf:bHd@VGd$Z<=P;eVRUT7`JURV6\Hf@+accK3efPT+aHdIc=ANIbT8dP<cRNC?8Y:LI^RP?-a^VUO;ELG%VQTSTVTV?f`:L6`G^S?aPHa6cQWU&P_`UTVTT@VH=TRS9A9RTTVVRU@f^:N>QHcTVQJG`:%WPECVS'S.4F8;/IcTJc4Q?QVKDS>gdJfTVQQAffTP;<_KTF6EUTUTVGTBQ5B&35G:AH?BFGV;UTTUTTVU@UTS<?aU7TWV=gL/[[TO6`QH^'bQVSVTIV,ddW:A8`J\V9TS5@NL1~T7hOTUHfTUVQGQ6HRHbNR$]<C@MS=UPQWJcHfS43/\&WMOGTTSVGfUTS&FI^7`QFb1ReaP@6TJ)DbUQK6><3=AG*HN2aTJ^Eb@O?QO,[5IVT>7`SSQ5e]QKeTT?fM,T=Fd65@6``LMMH^:TUSVVSWV,ecB8G6PLVQHfVRGe@A4cRSDMCLQCJ'NP>(bO+C;fT:eJcQQUTSTHP3`aPKQVQQ@Q-`C9TFeKPN'JOQ@DOQD[9aOT&RUS=%R9bN4WNHfQBEfUTJM@:bP>QVRUTUSGfUDdSTS9I2~A`SV:MQW7`B<cU?QUF`>-AU0ag]SGfVDcE_BDTEcCG2OKN:e@T?TQRE_;RTSG\&STN9-HHaRUNRC9SH;ZVPQNG6E*V;^fVTTW8?F3K;48PTVIcRGT6[SVK&j>`VS'aWMKKIJRT.PLFS<;5bALHfSUTHfV>aTQRDULPQ>>VUSHb(W]TR30a7`;T9bTSV)ZQTT@AJ5_<^=KVJ9gRN3tPTVS+9R/`iS4_K^RTTHfUTSVUSO.XaO@KfJRQ7`IdTUSHfTU*M/,S56<<IcSUTT%mR=bRTFaUGd;UTVGf&R]CE6[dQC(cG`IfSVS@6^^F<e+XTKf=fR7N18VTVIgUT?G,S;eSV5EQRW-]XLUDD;7bWOE9WJ1chRQRR2VUSTVSVAP=_SRIeA@&eRQVARV@NKf>FTIcSHeV@N;eSCdfSIc+SUSV;[4hCaRKfGR<`C,@RbecF^=eV@ffUQM2~?RHfHbQR:?TTUT4^7hOVQRSM.URUL=L4feNSLUFWH6.38\fR<MBUK=_K/__TUHfT8ecSVSX@bXM.eeeZ=VQL<cS,HU:_SV:eGSEM8[cTUT4N>:eT<RT<C`Ge9S@:a@ZeVVQ,`TA_PGfTUVIeUSKfTQWREcGfL>F`SGV5b,@e5dUVSIgVJcH_6cQId4hdPUT/:^caLK(dSBQSQ,19VRTON?TVU4J8R<MPO?TG?EeV4VIfGd29@TUVRL\EPK9aOW=+]a`C:39PS;S6_HSS>]Ga>bN7iOAffUSUSU8aSSVVSUDJ*`DO3aVRUE]4]IFe=eTDJfTR7gGbP?fb$YQTHfM?7YT+_ZT>UT7dIQ>>@UPSU.SVGG_UQFfTVVS1cbAbgUUUJfU<M:ddUAf_6`fQ;IbQN4ddIU5mTSTUTCb:T<N><6^QAQVRUL,M+D/4PaU@TQKKRT7]PG`=QTSS6[XWSWUT2d[PFUTS6bRS@2dffTGcN;e;aSRWPQ?QVPWRTSW4OAA*KI5lORFfUSJcQVB*FcTKb=TTSUUGUTTT3bRTV@OUSU7f:d6\fQ6_FOGL&,?D+XcTUT0IUSVVGU;&cbMRT8c,a^VSTVEWDRT0eeedH[QV?QVAK3l@7?P<bR,I:8TOUTSQVGfRSGfVHfHfRUVRRTRR6fe^HPTIaVJcQUTVSQAVTUS/\dYFb8deSWSN>Q*dbPUVRTSL??SUTSS7k0^`QTUJTIcFfDcQQ?SQT+US9<KGRS)b<6e`?TS=TST6`?cfVRMbTUJfCaQ7`VNVQ:N<LU8dVRRKfP6`1be\RQ>SJ>S0ghARHSTU6F5UfPS,c``bURQ@UAd^4dffT=RTUQSU(aa]OIR:H`QV8IZR4eeTTUVRUT>>1>7PeTQO5mOS@MCETUR<R7gPNSCNDRU=J<KTU@Jd2ba?dfVRFf;cfS5RUVU1PUQV@VT-fffffTTTT@N0PP2JUK:eD3\S=VRUTBC\BeZ6aPZ;H$SQV=BVR2JSTT8QHeHfIe0fXP+SM*D,9;<SUSEWQSGcU<0RPS=bB_UH_8IIbBRUTIeS=dUR(bcbB;TUTVS4fS=EPSTL+_SVVGd,h]FQS;eB5caKM(ddOTHdFe:/dgA8bWU8cUSRBACdWSTTJc@8IV>d>ffU8dT=5^4ed>c`RJcHfIfKfGfJ,P;8J;RFdBSBC=fPDR'`bZ?NVTUSVSV<YSTQ9FVHfH]USG]@MQ;eCQW7MTQ8S-`+Wdeeee]QSK$ffMTP?T(F`bV@WT1c^<URUTUT0eeeebHQST7gg^&:?@MSV@QUUSK\VRUQ%XTGfPM5C1]'bcbSTST.bHMTOCSSQHhTKb5L4cIUSTV6TSQRTTUTUVSTCgTUTVHfP?EcU;ffc@cc6Z7@`T<SP6_-bccM)R;ANXVHbUSURQUHfQE@bb?^1H4~VQAfgV6N<4`SURUSTSFW?9N:OdRCVVQEeS5I;7;R-cccP?S8GV<TVGfVVRT7hVQHd;QP%cddX3@bST9PI5bcVM9S>DeU.ffL7RV@P<TS;RTVSVAffVVBc>@BRVSHUS@REI<I2>MQ/N%a`dX:_IfWR:FUU>KfVIhVRKfSLdSV=;6_KSFeM+dhPEMFfT>TT6XPJ@QUHfT>]IUUVRU8V@WcTSRVTTS3fdBQSTUGc0M5QR9@bUSJfT7`SUTRQUSTVTHfWQTVRTMO2vACbfVQUT?;8_H:E?M;bPV:VN,l@TQCQ*a'K;daS1c[:HaMSSWSTSTHcTSRTUUSHe>;TVVQRIcIfTSRU;.[UT;RTSOF_TRTUVTTST:_HcTT>GUGO0ZH>TVTSDfV=R>ffVUHfTFXVJdH`7`VTVQQUTVS@TTUSCfSQD]>@=VCRUR@cRVDM+eUNB?NH;S@?QVSU=N;eVQUI'ZKQHfRVSKfVHfU9cTBggT'RPTQTV@ffRT6iVRVI_4_PSRTVRURGhHc9J9dPU/__TWVRHPQM9lVSW/a\S@KfUQQVRRGfURUQUKfB[[TUSSV<6`2ijTIfK*^A_@THfT;=SS=8`GfBMRJcQFfVGb;7a`PUQQ&]RV&TSSAfgU:[IXST9A^\6]N];ST?STUVRVT&:+AA+BO-cffdRFdUTS1I9fI$6gHdTUU?RKK<4ZSSQ&JU=?dUB@<<I5aA\KNGfVS<9ad`QHRWRHeSM2`EFG^;RTVPf

Sequence data that is assembled or reconstructed from sequencing reads, are usually stored in a slightly different format. The “fasta” format (.fasta/.fa/.fna) is similar to fastq, except without the quality scores for each base. This format consists of a header line prefixed with a “>” character, followed by a newline, after this the actual sequence data in the form of ACTG/actg and N(to represent unknown bases). Followed by another new line. For example:

>Chromosome1
ACGTACTGATCAGATCGATCGAGCGGGCGCGCGCAATGCATCGA
>Chromosome2
ACGCTAGCTAGCTAGCTAGCATCGATCGACA
>Chromosome3
GCATGCATCGATCGATGCACGGCATCG

Q2:

A. fastq files generally contain what type of information?

  1. Sequencing reads, containing sequences of nucleotide bases.

  2. Sequencing reads, containing the quality scores of nucleotide base calls.

  3. Sequencing reads, containing both nucleotide base calls and corresponding quality scores.

B. The Sequence Read Archive (SRA) contains what type of data?

3. Pre-Assembly QC

The type of sequencing and assembly you may want to do can depend on your study organism. Some species have intrinsically more or less complex genomes. There are a few important considerations that can inform your strategy, but sometimes it is difficult to know what these may be without data. Some major considerations are; sequencing coverage, ploidy, heterozygosity, repeat density and repeat structure.

3.1. Sequencing coverage

Sequencing coverage is the number of times that any given base-pair in your genome appears in your sequencing read data. Put another way, it is the ratio of sequence read data to actual genome size, so if we imagine a 100Mbp haploid genome and we have 1Gbp of sequencing data, we would estimate that we have a sequencing coverage of 10. In practice, this estimate is usually roughly accurate for the median coverage value, but due to the random probability of sampling in combination with sequencing bias, coverage varies according to a distribution.

Let’s have a look at our datasets again. First, let’s have a look at the basic statistics for the HiFi data using the BBTools suite. The command is written like this:

##Bash/command line code
reformat.sh in1=SRR10971019.fastq lhist=HiFi_length.txt

To help you get started I will give you an example SLURM script which you can copy to run this command:

#!/bin/bash
#SBATCH --job-name=bb_reformat
#SBATCH --output=reformat_result.txt
#SBATCH --partition=nu-wide
#SBATCH --ntasks=1
#SBATCH --time=0:10:00

module load bbmap/37.36

srun reformat.sh in1=/home/taller-2019/Workshop_materials/SRR10971019.fastq lhist=HiFi_length.txt
Remember to run the SLURM script with “sbatch”!

This should only take a minute or two to run, and you can then view the resulting output file, like this:

less reformat_result.txt

We can see that we have 95,514 reads and roughly 1.4Gbp in total. This command also prints a histogram of read lengths to the file “HiFi_length.txt”. You can try downloading this file to your local computer using a command like.

scp username@kabre.cenat.ac.cr:~/HiFi_length.txt ~/Desktop

You can then try to plot this data using R, if you are interested in visualisation.

##R code
library(ggplot2)
HiFi_hist <- read.table("HiFi_length.txt")
ggplot(data=HiFi_hist,aes(x=V1,y=V2)) + geom_col()

A very useful initial QC procedure to follow is to break down sequencing reads into short (e.g. 21 nucleotide) overlapping sub-strings called k-mers (in this case we would call them 21-mers) to allow us to get a more accurate assessment of genome properties. k-mers are useful because some sequencing reads contain errors. By breaking down reads into sub-strings, we can distinguish technical errors from biological sequence based on the number of times they appear in the dataset (error k-mers hould be very rare, whereas “real” biological data should appear at a frequency dependent on the sequencing coverage).

Let’s have a look at the k-mer content to find out more about our dataset and our organism. Here’s how you could generate a 21-mer histograms for the data we downloaded earlier, using the program jellyfish.

##Bash/command line code
jellyfish count -t 20 -m21 -C -s 10G -o ecoli_HiSeqX.jf21 SRR18106304.fastq
jellyfish histo -o ecoli_HiSeqX.histo ecoli_HiSeqX.jf21

##R code
library(ggplot2)
HiSeq_kmers <- read.table("ecoli_HiSeqX.histo")
#you may need to adjust the x and y axis limits, depending on the data.
ggplot(data=HiSeq_kmers,aes(x=V1,y=V2)) + geom_point() + xlim(0,500) + ylim(0,100000)

We can also feed our data into another program (genomescope), which is able to fit a model to the histogram and calculate some predictions about the genome properties, such as genome size, heterozygosity (if present), repetitiveness and sequencing coverage.

##Bash/command line code
/home/taller-2019/Workshop_materials/genomescope.R ecoli_HiSeqX.histo 21 150 hiseq_genomescope

Q3: For each dataset:

A. How much total sequencing data do we have for each dataset?

B. What is the estimated size of this genome?

C. What is the approximate sequencing coverage for each dataset?

D. Which dataset has the highest error rate?

3.2. Ploidy

Ploidy refers to the number of chromosome copies that an organism has. A single copy is haploid, two copies of a chromosome is diploid (most sexually reproducing organisms are diploid), however higher numbers of copies (the general term is polyploid) are commonly found in nature, particularly in plants.

It is important to note that different ploidy states can exist within an organism - In humans for example, the autosomes are most often diploid, the mitochondrial genome is most often haploid, and the X chromosome can be either haploid or diploid.

In addition, ploidy levels can vary between different tissues. Some cells found in the liver are normally polyploid and karyotype alterations have also been linked to cancerous tissues. In another case, Honey bees can have entirely haploid autosomes (which initiates male development) or diploid autosomes (which initiates female development). Choosing which individual and tissue to sample for whole genome sequencing is a crucial decision!

Figure 2: k-mer histograms showing the characteristic double peak of a diploid organism, with the homozygous peak region indicated in green, the heterozygous peak region indicated in blue and the error region peak indicated in red. Notice the difference in the error region, between the Illumina and the PacBio CLR read data.

Q4:

A. Our datasets represent a genome with what type of ploidy?

B. The data for a different organism shown in Figure 2, has what type of ploidy?

C. An organism with more than two copies of each Chromosome, has what type of ploidy?

D. In Figure 2, why does the PacBio CLR data have more k-mers in the red region?

3.3 Contamination/Endosymbionts

Despite best intentions, non-target DNA is often included in whole genome sequencing experiments. The level of contaminant DNA may be altered by the organisms size, preservation, storage, parasitism, endosymbionts and other factors. It can be useful to screen sequencing read data for contamination. If the level is low it may be possible to filter out contaminant reads pre-assembly, or contaminant contigs post-assembly (although erroneous incorporation into the target genome is possible) (Koutsovoulos et al., 2016).

Here’s a program called mash, that you could use to screen your sequencing reads for contaminant DNA.

##Bash/command line code

mash screen refseq.genomes.k21s1000.msh SRR18106304.fastq > screen.tab
sort -gr screen.tab | head

The first column of the output is the identity score. It is a metric of the fraction of bases shared between the genome and your sequencing reads (this is estimated from the fraction of shared k-mers), and so high values indicate a likely presence of a species in your data.

Q5:

A. Do any of our datasets contain contaminant DNA?

B. Should you remove out contaminant sequencing reads before or after assembly?

4. Assembly

4.1. Read QC

Prior to assembly, it can be useful to check whether there are any potential issues with our data. One important consideration is adapter contamination. Adapters are runs of a known synthetic sequence that is often used as part of high throughput sequencing reactions. Sometimes traces of these synthetic sequences are present in sequencing reads and must be removed for the associated biological sequence.

We can use a tool called fastqc to generate a useful report, like this:

fastqc SRR18106304.fastq

We can then view the resulting html file with a browser, like this:

unzip SRR18106304_fastqc.zip
firefox SRR18106304_fastqc.html

fastqc is designed to indicate when data doesn’t meet theoretical expectations, but this does not necessarily mean there is a problem with the data.

If we suspected that there might be some adapter contamination based on the fastqc results, we could use a trimming program to try and reduce the problem. Here is an example.

bbduk requires the module “bbmap/37.36”
bbduk.sh in=SRR18106304.fastq out=SRR18106304_clean.fastq ref=/home/taller-2019/Workshop_materials/adapters.fa ktrim=l k=23 mink=11 hdist=1
  • ref = the fasta file of adapter sequences you want to remove

  • ktrim = whether the adapter is expected on the left (5-prime) or right (3-prime)

  • k = the length of a k-mer used to identify contaminant sequence

  • mink = look for shorter kmers of this length at the ends of reads

  • hdist = allows a number of substitutions

4.2. Contig generation

There are a number of algorithms that can be used to reconstruct genomic sequence from reads. Two of the most common are the de brujn graph, which utilises overlapping fixed-length k-mers to reconstruct the sequence. As previously discussed, k-mers have some useful properties for assembly, however they cannot resolve sequences that are repeated throughout the genome if they are longer then k (for example transposable elements).

de brujn 1: The code below decomposes an input string “This is good, This is bad” into 7-mers. When we already know the order of these chunks we can lay them out as shown, to reconstruct the original string. But, what if we don’t know the original order?
[1] "  This is"
[1] "   his is "
[1] "    is is g"
[1] "     s is go"
[1] "       is goo"
[1] "       is good"
[1] "        s good,"
[1] "          good, "
[1] "          good, T"
[1] "           ood, Th"
[1] "            od, Thi"
[1] "             d, This"
[1] "              , This "
[1] "                This i"
[1] "                This is"
[1] "                 his is "
[1] "                  is is b"
[1] "                   s is ba"
[1] "                     is bad"
de brujn 2: This code attempts splits the 7-mers into two 6-mers corresponding to the left 6 prefix characters and the right 6 suffix characters. For example, the word “example” becomes “exampl” and “xample”. We can loop through all of the prefix 6-mers and find matches with suffix 6-mers, and thus reconstruct larger runs, as shown above. However, you will notice, that due to the repetition of the phrase “This is”, the graph has multiple valid paths.

de brujn 3: This looks quite messy, there is a cyclical structure and a spur. There is no simple route through the graph. Let’s see what happens when we increase the k-mer length from 7-mers to 12-mers.

Let’s have a go at running a full de bruijn graph assembly on the Illumina reads, using the program Spades. This program is able to use k-mers of multiple lengths and combine the resulting graphs for a more contiguous assembly.

You will need to run assembly programs using SLURM scripts. These programs may need large amounts of RAM and CPU, so consider running on the dribe-long partition, with a large amount of time.
##Bash/command line code
spades.py -s SRR18106304.fastq -o ecoli_hiseq_spades

Q6:

A. De brujn graphs are a useful method for which kind of data?

B. Longer k-mers may help produce more contiguous assemblies. True or False?

C. OLC graphs are a useful method for which kind of data?