BackgroundAllele Specific Expression analysis is an important tool for integrating genome and transcriptome data. It quantifies expression variation between the two haplotypes of a diploid individual distinguished by heterozygous sites, and is a powerful tool to estimate cis-regulatory diversity of alleles. Clustering algorithms can be used to identify patterns or groups of genes/samples based on their expression profiles. Depending on the structure of the data, different existing clustering algorithm can be adapted to allele specific expression data. However, no ad-hoc procedure has been developed.ResultsIn this work, we begin defining an expression matrix capturing allele expressions from an RNA-sequencing experiment. On this matrix, we develop a novel two-phase unsupervised clustering procedure, built on top of a spectral clustering algorithm, whose aim is to partition the population into groups of similar individuals, according to their allelic expression. As case-studies, the approach is used to cluster 98 cultivars representative of the variability observed in Vitis vinifera, starting from read counts of genes of chromosome 1 of leaves, and to analyze allele-specific count data from a CASTxMRL F1 hybrid mice dataset.ConclusionUsing the above mentioned real case-studies as well as generated synthetic data, we see that our algorithm shows significant robustness and outperforms other standard clustering techniques.

A two-phase clustering procedure based on allele specific expression

Pagliarini, Roberto
Primo
;
Nascimben, Francesco
Secondo
;
Policriti, Alberto
Ultimo
2026-01-01

Abstract

BackgroundAllele Specific Expression analysis is an important tool for integrating genome and transcriptome data. It quantifies expression variation between the two haplotypes of a diploid individual distinguished by heterozygous sites, and is a powerful tool to estimate cis-regulatory diversity of alleles. Clustering algorithms can be used to identify patterns or groups of genes/samples based on their expression profiles. Depending on the structure of the data, different existing clustering algorithm can be adapted to allele specific expression data. However, no ad-hoc procedure has been developed.ResultsIn this work, we begin defining an expression matrix capturing allele expressions from an RNA-sequencing experiment. On this matrix, we develop a novel two-phase unsupervised clustering procedure, built on top of a spectral clustering algorithm, whose aim is to partition the population into groups of similar individuals, according to their allelic expression. As case-studies, the approach is used to cluster 98 cultivars representative of the variability observed in Vitis vinifera, starting from read counts of genes of chromosome 1 of leaves, and to analyze allele-specific count data from a CASTxMRL F1 hybrid mice dataset.ConclusionUsing the above mentioned real case-studies as well as generated synthetic data, we see that our algorithm shows significant robustness and outperforms other standard clustering techniques.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11390/1327664
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 0
social impact