DNA Technology – answers to the exercises

Exercise 1: Query a public database

Which effect has this variant?

It causes a translation stop; Glu1157X

What are the characteristics of the FASTA format?

The first line starts with > followed by the identifier of the sequence (gi number in case of Genbank, an accession code in other databases) and optionally a description of the sequence. The sequence is on the consecutive lines. https://en.wikipedia.org/wiki/FASTA_format

Exercise 2: Data conversion/translation

How many lines in a fastq file describe one sequence?

4 lines. The first contains the accession code, the second the sequence, the third an optional description and the fourth the base quality scores.

How many sequences does the FASTA file contain?

11,052 sequences

Which organism(s) are in the dataset?

Candidatus Acetothermus autotrophicum DNA, large contig sequence, contig 4

Exercise 3: Compare two sequences to identify mutations

Which mutation causes Sickle cell disease?

The sixth amino acid in the hemoglobin beta chain (HBB) is mutated from glutamic acid to valine (Glu6Val)

Identify the mutation which causes the change in amino acid

The amino acid “X” is not a variant. You see in the HBS sequence that some nucleotides are different from A, C, G or T. The other letters represent IUPAC codes (see IUPAC list). The “N” means that that base could not be determined. The “Y” represents a “T” or “C”, which in this case might or might not be the same as for the normal HBB.

Exercise 4: Pick primers to screen patients for the HBS mutation

Does the resulting primer set include the region with the mutation?

Yes! The easy way is to search for the start codon (ATG) in your browser and check if the sixth codon is included within the primer set.

What is the rs-id of the variant corresponding with the HBS phenotype?

rs334. In the dbSNP database you will find the frequency of the variant in different populations.

Exercise 5: Gene finding

What is wrong with this transcript?

The mutation in region 3713 (G>T) causes a translational stop. See the first exercise.