How AnnoMiner Works

Annotation

Enrichment



Choose the function of AnnoMiner that you want to investigate!

Input

Gene List



For the Enrichment analysis upload a List of gene IDs. AnnoMiner will autmatically convert your IDs to match the resource of interest.
Remember just to upload a list of at least 10 known IDs to perform the analysis!



Bed file



Bed (Browser Extensible Data) format is a plain text file in which each feature is separated by a tab ("\t"). The first three Bed fields are mandatory:


  • chrom - The name of the chromosome
  • chromStart - The starting position of the feature in the chromosome
  • chromEnd - The ending position of the feature in the chromosome

This file can contain genomic regions defining, for example, transcription factor binding sites or methylations sites obtained from ChIP-seq.


You can, moreover, upload more then one bed format file and integrate them!



Custom annotation (Optional)



A custom annotation file is a plain text file in which each feature is separated by a tab ("\t") or a comma( ","), two formats are indeed available: tsv (tab separated values) and csv (comma separated values).

The first field (column) of the custom dataset has to be mandatory gene symbol (AnnoMiner will automatically convert the IDs then to match the selected resource of interest).

The other fields are fully customizable by the user that can indeed choose: the number of columns (up to six) and the label to assign to them. The user can also specify the type of content (text or numeric) with the purpose of a correct results ordering.

With this option the user will be allowed to upload and annotate the results of the different peak annotation analysis, for example, with: differential expression analysis results, clustering results, custom annotations ecc.


What AnnoMiner does

Enrichment Analysis



It makes possible to evaluate if the genes contained in the uploaded list are potentially co-regulated by a TF.
It is made possible through an hypergeometric test, which test if a certain TF binds more often the promoters of the user provided genes list than compared to the background (the whole genome).

The test will be performed on all the available trascription factor binding sites data (TF ChIP-seq) present in ENCODE/modERN resources (last update: January 2020)


The Dynamic Ranges option automatically detect from the data the optimal threshold to define the upstram promoter boundary for each TF. Alternatively the user can also manually define it. Moreover the user can choose the minimum overlap (bp or %) to consider a binding as biologically relevant.

4 Type of Analysis



It makes possible to annotate your regions, mapping them against the selected annotated genome, in three different ways:


  • Peak annotation - The user will define different genomic features: flanking regions upstream, flanking regions downstream and promoter region nearby the TSS. This ones, in add to the ones present in the annotated genome (5' UTR, CDS, 3' UTR), will allow the user to annotate the uploaded genomic coordinates with the respective genes features
  • Nearby genes annotation - To perform this analysis the user have to provide in add differential expression results (mandatory the geneID and log2FC fields). Then the user can choose if to visualize the overall amount of the deregualtion or the overall directionality of the deregulation for the first five closest genes upstream and downstream the uploaded genomic region/regions.
  • Long range interactions - To perform this analysis the user have to provide in add differential expression results (mandatory the geneID and log2FC fields). Then the user can choose if to visualize the overall amount of the deregualtion or the overall directionality of the deregulation for the genes included in a user-defined search window from the uploaded genomic region/regions.
  • Peak integration - The Peak integration function will be performed on multiple (up to 5) user provided bed files. The user will then be able to investigate co-occuring epigenetic events (for example TF binding sites and Histone marks)

For each of these functions the user will also specify the minimum overlap required between the uploaded region and the gene feature in order to consider a binding as biologically relevant.
Moreover the user can take in account for the choosen analysis all the transcripts or only the longest one per each gene (canonical) and if consider genes directionality.


Output

Barplot and Datatable



At the end of the enrichment analysis the results will be shown within a downloadable datatable and an interactive barplot.


barplot

The barpots shows the first 10 top ranking results ranked by combined-score. Moreover hovering on the bar, corrisponding to the TF of interest, you can retrieve information about the underlying TF- ChIP seq experiment.


barplot


In the datatable nine columns are present for each transcription factor:

  • Target factor: antibody target of theIP-seq experiment in order to determine the transcription factor binding site of the transcription factor of interest
  • Probe: Information about the probe
  • Treatment: Information about the treatment (data from H.sapiens experiments, source ENCODE)
  • Replicate: Information about the replicates (source ENCODE)
  • List Hits: number of genes, belonging to the uploaded list, which have the specified trascription factor binding site in the promoter region
  • Genome Hits: number of genes, belonging to the genome, which have the specified trascription factor binding site in the promoter region
  • Target Genes: Gene names of the List Hits genes
  • Score: Score of the hypergeometric test computed as: (List Hist/List Size)/(GenomeHits/Genome Size)
  • P-value: p-value of the hypergeometric test
  • FDR: adjusted p-value (Benjamini-Hochberg)
  • combined-score: score * -log10(p-value)


The datatable is fully responsive: can be ordered, it's possible to search for genes/transcription factor of interest among the results, directly check the target genes of specific transcription factor and hovering with the mouse on the field of interest you can have a short description of the meaning of each value.


OSS. The statistical test is done based on all the transcripts belonging to each gene. For this reason the Genome size and the List size will be bigger then, respectively, the Genome of the organism of interest and the uploaded list.

animated

For each function, once the user will have selected the region of interest, will be generated a downloadable table containing the respective genes and transcripts.

Moreover it is a resposive table: can be ordered, it's possible to search for genes of interest among the results, and also, with a click on the gene of interest the user will be redirected to the respective selected resource


Optionally, this table will also be annotated with the data contained in the user provided custom annotation file.


animated