Answers to Frequently Asked Questions Regarding Gene Ontology a

  • "Big data", "omics", and "data mining" are vocabulary we have often heard in recent years, and second-generation sequencing is often used in scientific research. No matter which sequencing company performs sequencing or data analysis, a standard analysis method will be seen in the final report: functional enrichment analysis. Given that most people know little about this analysis method, we have sorted out and answered some of the most common questions about enrichment analysis, and will also answer questions about Go enrichment analysis that most people care about.

     

    1. What is the principle of enrichment analysis?

    Gene function enrichment analysis refers to statistical analysis with the help of various databases and analysis tools, and mining of gene function categories that are significantly related to the biological problems we want to study in the database.

     

    Its statistical principle is to use the hypergeometric distribution to test the significance of a certain functional class in a group of genes (co-expression or differential expression). Through the significance analysis, enrichment analysis and false positive analysis of the discrete distribution, the gene function categories that are significantly related to the experimental purpose, have a low false positive rate and are targeted are obtained.

     

    1. How to do functional enrichment analysis?

    There are many algorithms for functional enrichment analysis, and there are many tools that can do functional enrichment analysis, such as: DAVID enrichment analysis, PANTHER enrichment analysis, GenMAPP enrichment analysis, topGO enrichment analysis, ClueGO enrichment analysis, etc.

     

    Through the above introduction, we understand some knowledge about enrichment analysis. This is helpful for us to understand the GO (gene ontology) enrichment analysis.

     

    1. What is GO?

    GO (Gene Ontology) is a database established by the Gene Ontology Consortium. It aims to establish a semantic vocabulary standard that is applicable to various species, defines and describes the functions of genes and proteins, and can be updated as research continues. By establishing a set of controlled vocabulary with a dynamic form, to describe the role of genes and proteins in cells, so as to fully describe the attributes of genes and gene products in organisms.

     

    There are three categories of GO database, namely: Biological Process (BP), Cellular Component (CC) and Molecular Function (MF). Each describes the molecular functions that gene products may perform, the cellular environment in which they are located, and the biological processes involved. A basic concept in the GO database is a node. Each node has a name, such as "Cell", "Fibroblast Growth Factor Receptor Binding" or "Signal Transduction", and a unique number, such as "GO:nnnnnnn". Based on the identified protein ID, the GO database annotation information of the protein is obtained from the Uniprot database by mapping, and the protein is classified and annotated by function. For the GO nodes involved in BP, CC and MF, the number of all corresponding proteins is listed, and at the same time, statistical graphs are made for the secondary classification of expressed proteins.

     

    1. What is GO enrichment analysis?

    Use gene annotation information in GO database for gene enrichment analysis. The results of GO enrichment analysis include GO function classification results and GO function enrichment results.

     

    • GO functional classification: Count the number or composition of proteins or genes at a certain functional level.
    • GO function enrichment: Obtain a functional category that is significantly enriched relative to the reference gene.

     

    1. What is the analysis principle of GO enrichment analysis?

    According to the selected differential genes, calculate the hypergeometric distribution relationship between these differential genes and a specific branch (several) in the GO classification. GO analysis will return a hypothetical value p-value for each GO Term with differential genes, A small p value indicates that the differential gene is enriched in the GO.

     

    1. What are the current GO online enrichment analysis tools?

    There are many websites that provide GO enrichment functions. Most of these websites require the submission of a set of genes and then provide the results through online query. For example, DAVID (The Database for Annotation, Visualization and Integrated Discovery): https://david.ncifcrf.gov/

     

    CD ComputaBio's GO enrichment analysis can significantly reduce the cost and labor of the subsequent experiments. GO enrichment analysis is a personalized and customized innovative scientific research service. Each project needs to be evaluated before the corresponding analysis plan and price can be determined. If you want to know more about service prices or technical details, please feel free to contact us.