[Correlation Browser / New Search] [List of best correlations] [List of all variables] [Stats] [Help/About]

Correlation Browser for the bone dataset: Help/About

This website provides a tool to study correlations between genetic transcripts of the iliac crest bone, based on a previously published case-control study of N=84 post-menopausal Caucasian women [1]. The original data set is the result of a global gene expression analysis of 84 iliac crest bone biopsies using Affymetrix Human Genome U133 Plus 2.0 Arrays (quantitating more than 22 000 probesets) and Applied Biosystems PCR-based LDA cards (adding more than 300 microRNAs or other non-coding RNAs). In total, there are approximately 23 000 variables (M) per participant. All normalized Affymetrix signal values were log-transformed and the normalized PCR Cq values from the LDA cards were subtracted from 40 in order to get increasing values with increasing expression. Annotations were added to the M Affymetrix-based variables using the Bioconductor [2] package "hgu133plus2.db" for the statistical computing environment R [3]. These annotations include direct mappings of each Affymetrix probe label to, where available, exactly one...

corresponding NCBI Accession Number,
Entrez Gene ID,
official Gene Symbol and
the more descriptive Gene Name,

as well as to lists of potentially arbitrary length of...

Gene Ontology (GO) Terms,
KEGG Pathways and
chromosome start sites.

Note that any given gene may be associated with several Affymetrix labels. Note also that no annotations for microRNA were readily available for inclusion into the Correlation Browser. Data table contains M expression measurements (Affymetrix and miRNA) from N study participants. Most of the M measurements are annotated with e.g. gene symbols.

Data table contains M expression measurements (Affymetrix and miRNA) from N study participants. Most of the M measurements are annotated with e.g. gene symbols.

From the original N×M data set, an M×M matrix of Pearson product-moment correlation coefficients r between all possible combinations of variables was calculated in the statistics environment R and saved column-wise in a MySQL database, along with the aforementioned annotations and the raw p-values that correspond to the r-values and the number of participants N according to $t = \frac{r}{\sqrt{\frac{1 - r^2}{N - 2}}}$ under Student's t-distribution with N-2 degrees of freedom. The figure below illustrates a common query in the Correlation Browser: The user requests to view the correlations between a gene and a subset of all other variables. The program will first identify the variables associated with the gene and those with the requested subset. It will then fetch columns of r- and p-values corresponding to the former from the database and truncate them to the subset. A query selects a subset of the big precalculated correlation matrix to be returned. This data is then enriched with p and q values.

A query selects a subset of the big precalculated correlation matrix to be returned. This data is then enriched with p and q values.

The resulting list is sorted and expanded by calculating, from the raw p-values, the Bonferroni correction q and the Benjamini-Hochberg False-Discovery Rate [4]. The number of tests to correct for is assumed to be the number of correlation coefficients in the scope of the user's query, highlighted in purple in the figure above. In cases where the user's query includes the trivial "self correlations" on the diagonal of the matrix, these are removed (by default; can be toggled). Correlations between two (or more) variables that belong to the same gene, however, will be considered in the Bonferroni / BH corrections, but marked in gray in the output list. This behavior is intentional, because it is the expression variables (and not the genes they represent) that form the statistical basis for the whole correlation analysis.

References

Correlation Browser 2.0: Daniel Sachse, Sjur Reppe, Kåre Gautvik - 2013.