Back

Introduction


One of the difficulties in genome sequencing is closing the gaps during the assembly stage. Each level of the project has a different number of contigs. During the assembly stage, these contigs are examined for conserved sequence and other factors that may predict their orientation and location. Thus, in resulting assemblies, the expectation is to have less contigs as they are combined. However, in some cases, contigs do not overlap, and a sequence is missing. This makes orienting contigs very difficult. There are methods that can be used to close the gaps between contigs, but they can be expensive or insufficient. The Streptococcus sanguis project currently has over one hundred contigs that have yet to be oriented properly. Part of my work this summer will involve helping gap closure in the S. sanguis genome.

Once gaps are closed and the code is finished, the genome must be annotated. Annotation allows the investigator to identify previously documented open reading frames, as well as ORFs that code for novel or unique proteins. Annotation can lead to the labeling of proteins known to act as virulence determinants, which in turn can lead to developing vaccines agabr>inst the organism. I also hope to work on annotating the genome once it is finished.

In comparative genomics, researchers use a completed genome to compare an organism to close relatives. For example, a number of different Streptococci have been sequenced. Comparing the genome of S. sanguis to other streptococci would provide the ability to identify unique proteins and functions. This could give clues, for example, about the organism’s metabolic pathways, or its particular adaptations to its host environment. This is probably not something I will do this summer because of time constraints, but will be considered for the future.


Methods


There are a number of ways that gap closure can be achieved. The most common way is to use genome walking, where primers are designed for a number of contigs, PCR amplification is performed, and resulting sequences are examined for high-quality overlap. With a large number of contigs, this process can be expensive and time-consuming. One way to reduce the randomness involved is to compare contigs to similar sequences in a database. This method can elucidate the orientation of contigs, or suggest two particular contigs for genome walking. A program has been written by one of the students at BBSI that applies this method, but it remains insufficient to orient all of the contigs.

Annotation is done with the help of computer programs. A program like GLIMMER would probably be used for searching the code for open reading frames. A program like BLAST or FASTA3 would probably be used to annotate the open reading frames. Comparative genomics would also make use of software tools.


Possible Results


The most concrete result is the completion of the S. sanguis genome. The genome would have to finished – its sequence verified for accuracy – and annotated. Then, it can be submitted as a finished product to genome databases. Furthermore, examination of its open reading frames, and testing of the expression of predicted proteins, can lead to the development of a vaccine against S. sanguis. We can also gain a greater understanding of what processes the organism uses for its morphology and metabolism. In short, once the genome is completed, the avenues for study are myriad.

Back