Gene prediction or gene finding is an essential step aimed at identifying gene regions in genomes, whether these genes are non-coding or protein-coding.
There are many assembled genomes in databases used in various genetic studies, but few of these genomes have annotations.
In addition, only a very small set of genomes are updated annotations.
Therefore, in research dealing with genomes from databases, the step of re-annotation is essential.
Therefore, we present this course with practical, step-by-step, screen-recorded content for the use of gene prediction software with more than one example.
This course provides a set of protocols that have been modified and implemented without any errors.
The course includes all organisms from prokaryotes to eukaryotes.
The course Includes protein-coding genes and non-coding genes.
In addition to evaluating the resulting prediction from gene prediction software.
There are generic programs that are used with prokaryotes and eukaryotes:
tRNAscan-SE is used to predict tRNA genes. It is characterized by the fact that it does not require a computer with high capabilities.
Infernal is used with rfam database to predict all non-coding genes but is more accurate in predicting rRNA and other non-coding RNA genes. It is characterized by the fact that it does not require a computer with high capabilities, but it takes a long time.
There are programs for eukaryotic organisms:
BRAKER, which is used to predict protein-coding genes using the ab initio method, also uses extrinsic evidence from mapped RNA-seq and protein to support and increase accuracy. Requires a computer with high capabilities.
GeMoMa is used to predict protein-coding genes by homology and also uses extrinsic evidence from mapped RNA-seq. Requires a computer with high capabilities.
There is a program for prokaryotic organisms:
Prokka is used to predict protein-coding and non-coding genes using the ab initio method and also uses extrinsic evidence from the protein. It does not require a high-powered computer.
Finally, the BUSCO program is used to evaluate the prediction of protein-coding genes in prokaryotic and eukaryotic organisms. It does not require a high-powered computer.
All programs have been screen-recorded on the ubuntu distribution since it is the most famous Linux distribution.
It is best to install the Ubuntu distribution on your device or create virtual ubuntu using a virtual box in order to implement the course practically on your device.
You will eventually get the protocols in text format that you can apply with genomes from the database in a graduation project, in a research paper, or in a poster.