Similarity (or conversely distance) measures are at the heart of most bioinformatic applications. When the similarity involves only a small subset of features out of many, global similarity measures may be significantly affected by noise. Selecting only a subset of (putatively relevant) features for comparison is a widespread solution to the problem albeit affected by arbitrariness and manual intervention. The problem is becoming more and more important due to the increasing amount of experimental data available. In recent years measures based on ranking similarities between two datasets have been proposed. Here, we use one of the proposed rank similarity measures, sharing some aspects with the fraction enrichment score used for protein structure prediction and the gene set enrichment analysis, and test its performance in classifying experiments. The discrimination ability of the similarity measures based on the overlap of ranked genes tested here compares well or better with standard measures of similarity. This conclusion supports the use of rank-based proximity measures to gain further insight in dataset comparisons, particularly on expression data obtained by different techonologies (e.g., RNA-seq and microarrays).

Similarity Measures Based on the Overlap of Ranked Genes Are Effective for Comparison and Classification of Microarray Data

FOGOLARI, Federico
2016

Abstract

Similarity (or conversely distance) measures are at the heart of most bioinformatic applications. When the similarity involves only a small subset of features out of many, global similarity measures may be significantly affected by noise. Selecting only a subset of (putatively relevant) features for comparison is a widespread solution to the problem albeit affected by arbitrariness and manual intervention. The problem is becoming more and more important due to the increasing amount of experimental data available. In recent years measures based on ranking similarities between two datasets have been proposed. Here, we use one of the proposed rank similarity measures, sharing some aspects with the fraction enrichment score used for protein structure prediction and the gene set enrichment analysis, and test its performance in classifying experiments. The discrimination ability of the similarity measures based on the overlap of ranked genes tested here compares well or better with standard measures of similarity. This conclusion supports the use of rank-based proximity measures to gain further insight in dataset comparisons, particularly on expression data obtained by different techonologies (e.g., RNA-seq and microarrays).
File in questo prodotto:
File Dimensione Formato  
j_comp_biol_23_xx.pdf

non disponibili

Tipologia: Documento in Pre-print
Licenza: Non pubblico
Dimensione 350.2 kB
Formato Adobe PDF
350.2 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11390/1102988
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 2
social impact