Comparative genomics is a branch of biology that compares the genome sequences of different species, from microorganisms to human. Researchers can learn what distinguishes different life forms at the molecular level by comparing the sequences of genomes from different organisms. Comparative genomics is also helpful for studying organism evolution because it helps in the discovery of genes that are conserved or shared across species, as well as genes that give each organism its own unique characteristics.
Microbial whole genome sequencing is now fast and inexpensive enough to be considered a tool for bacterial research. This work is done by a diverse group of people who are interested in a wide range of topics related to bacterial genetics and evolution, including researchers, public health practitioners, and clinicians. Clinical isolates, as well as laboratory strains and mutants, are studied, as are outbreak investigations and the evolution and spread of drug resistance. Many labs can now generate bacterial genome sequences in-house in a matter of hours or days using benchtop sequencers like the Illumina MiSeq, Ion Torrent PGM, or Roche 454 FLX Junior.
Figure 1. An example of comparative bacterial genome analysis. (Xia, 2016)
Assembly, contig ordering, annotation, genome comparison, and typing are the five logical segments of the workflow.
The procedure of combining overlapping sequence reads into contiguous sequences (contigs) without using a reference genome as a guide is known as de novo assembly. Short-read sequence assemblers that use de Bruijn graphs to generate an assembly are typically the most efficient. The open-source program Velvet is one of the first and most widely utilized de Bruijn graph assemblers. Velvet stays one of the most-used (and cited) assemblers for bacterial genomes, particularly fits to Illumina sequence reads, with further development to enhance the resolution of repeats and scaffolding using paired-end and longer reads.
Ordering and viewing assembled contigs
The next step is to arrange the contigs against a suitable reference genome after they have been assembled from the sequencing reads. The best reference to use is usually the most closely related bacterium with a 'completed' genome, but finding the best reference, as in the case of E. coli O104:H4, may require trial and error. Contig ordering can be done with command-line tools like MUMmer, which can be made easier with a wrapper program like ABACAS.
The next step is to annotate the draft genome once the ordered set of contigs has been obtained. Annotation is the process of locating and identifying genes, as well as ribosomal and transfer RNAs encoded in the genome. Uploading a genome assembly to an automated web-based tool like RAST is the most straightforward way to annotate a bacterial genome. Many command-line annotation tools are also available. These include de novo gene discovery methods like Prokka and DIYA, as well as programs like RATT and BG-7 that transfer annotation directly from closely related genomes.
In the fields of molecular medicine and molecular evolution, comparative genomics has a wide range of applications. The identification of drug targets for many infectious diseases is the most important application of comparative genomics in molecular medicine. The use of comparative genomics can aid in the selection of model organisms. Comparative genomics aids in the clustering of regulatory sites, which can aid in the identification of previously unknown regulatory regions in other genomes.