Genomic law guided gene prediction in fungi and metazoans Article

cited authors

  • Fang, Y; Li, J

fiu authors

abstract

  • Protein coding gene prediction by computational approaches is a fundamental step for genome annotation. However, it is a challenge to accurately predict eukaryotic genes in silico. By surveying the model genomes, we found that the Spearman's rank correlation coefficient between the number of experimental-verified genes and the size of genomes was 0.96 for all eukaryotes except plants, indicating the relationship between genome size and the number of coding genes can be expressed with a monotonic function. Regression analysis found that the relationship of total protein coding genes over genome size followed a logarithmic equation. We integrated the equation into ab initio gene prediction software to guide the gene prediction by constraining the total number of predicted genes. We evaluated the software in three eukaryotic genomes. Results showed that >90% of false positive predictions were removed while >80% of true positives were retained, resulting in much higher specificity. © 2013 Inderscience Enterprises Ltd.

publication date

  • January 1, 2013

Digital Object Identifier (DOI)

start page

  • 157

end page

  • 169

volume

  • 6

issue

  • 1-2