Similarity (or conversely distance) measures are at the heart of most bioinformatic applications. When the similarity involves only a small subset of features out of many, global similarity measures may be significantly affected by noise. Selecting only a subset of (putatively relevant) features for comparison is a widespread solution to the problem albeit affected by arbitrariness and manual intervention. The problem is becoming more and more important due to the increasing amount of experimental data available. In recent years measures based on ranking similarities between two datasets have been proposed. Here, we use one of the proposed rank similarity measures, sharing some aspects with the fraction enrichment score used for protein structure prediction and the gene set enrichment analysis, and test its performance in classifying experiments. The discrimination ability of the similarity measures based on the overlap of ranked genes tested here compares well or better with standard measures of similarity. This conclusion supports the use of rank-based proximity measures to gain further insight in dataset comparisons, particularly on expression data obtained by different techonologies (e.g., RNA-seq and microarrays).
Similarity Measures Based on the Overlap of Ranked Genes Are Effective for Comparison and Classification of Microarray Data
FOGOLARI, Federico
2016-01-01
Abstract
Similarity (or conversely distance) measures are at the heart of most bioinformatic applications. When the similarity involves only a small subset of features out of many, global similarity measures may be significantly affected by noise. Selecting only a subset of (putatively relevant) features for comparison is a widespread solution to the problem albeit affected by arbitrariness and manual intervention. The problem is becoming more and more important due to the increasing amount of experimental data available. In recent years measures based on ranking similarities between two datasets have been proposed. Here, we use one of the proposed rank similarity measures, sharing some aspects with the fraction enrichment score used for protein structure prediction and the gene set enrichment analysis, and test its performance in classifying experiments. The discrimination ability of the similarity measures based on the overlap of ranked genes tested here compares well or better with standard measures of similarity. This conclusion supports the use of rank-based proximity measures to gain further insight in dataset comparisons, particularly on expression data obtained by different techonologies (e.g., RNA-seq and microarrays).File | Dimensione | Formato | |
---|---|---|---|
j_comp_biol_23_xx.pdf
non disponibili
Tipologia:
Documento in Pre-print
Licenza:
Non pubblico
Dimensione
350.2 kB
Formato
Adobe PDF
|
350.2 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.