During the 1° year of my PhD I have worked on the evaluation of similarity measures commonly used in many bioinformatics applications. The increasing amount of data available in public database requires the development of tools for analysing them, so proper evaluation of similarity is becoming very important. Regarding the methodology that can be employed to evaluate proximity measures we pay attention to the concept of intrinsic separation ability, i.e. how well a distance discriminates objects belonging to different classes based on distance. The work I performed with Prof Fogolari was focused on finding the best similarity measure, and to compare known proximity measures versus the fraction enrichment proximity score (FES - developed by us) to assess the similarity among experiments and to identify genes that mostly contribute to similarity. During the 2° year, supervised by Prof.ssa Romualdi of Padua, I have generated a ChIP-seq data analysis pipeline exploiting the heterogeneity of different algorithms with the aim to extend graphite (bioconductor package) pathways annotation. Specifically, given a ChIP-seq result of a transcription factor, pathways annotation was expanded adding to the network the transcription factor (node) whose target genes were already annotated in the pathway. To this aim, ChIP-seq ENCODE datasets (important resource to improve pathway annotation) were used. During the 3° year, collaborating with Prof. Tell, I have worked on ChIP-seq and RIP-seq data analysis. First of all, I have generated a ChIP-seq data analysis pipeline with the aim to identify target genes directly regulated by APE1 during oxidative stress condition. The identification of target genes was performed by ChIP-seq analysis in order to identify APE1 preferential promoter binding sites. Then, by using RIP-seq data, I investigated the biological significance of the RNA bound by APE1 using several online tools. Gene Ontology analysis of biological functions was performed using DAVID online tool (https://david.ncifcrf.gov/). Interactions among proteins were identified by STRING (http://string-db.org). miRNA/mRNA targets were identified by data mining in miRGate (http://mirgate.bioinfo.cnio.es.). Another work I have been involved, aimed at studying the mef2a binding sites (common and exclusive) of GM12878 (lymphoid cell type) and K562 (myeloid cell type) cell lines by data mining process

Computational methods and pipelines for the analysis of next generation sequencing (NGS) data and pathway annotation / Fabrizio Serra - Udine. , 2016 Apr 01. 28. ciclo

Computational methods and pipelines for the analysis of next generation sequencing (NGS) data and pathway annotation

SERRA, Fabrizio
2016-04-01

Abstract

During the 1° year of my PhD I have worked on the evaluation of similarity measures commonly used in many bioinformatics applications. The increasing amount of data available in public database requires the development of tools for analysing them, so proper evaluation of similarity is becoming very important. Regarding the methodology that can be employed to evaluate proximity measures we pay attention to the concept of intrinsic separation ability, i.e. how well a distance discriminates objects belonging to different classes based on distance. The work I performed with Prof Fogolari was focused on finding the best similarity measure, and to compare known proximity measures versus the fraction enrichment proximity score (FES - developed by us) to assess the similarity among experiments and to identify genes that mostly contribute to similarity. During the 2° year, supervised by Prof.ssa Romualdi of Padua, I have generated a ChIP-seq data analysis pipeline exploiting the heterogeneity of different algorithms with the aim to extend graphite (bioconductor package) pathways annotation. Specifically, given a ChIP-seq result of a transcription factor, pathways annotation was expanded adding to the network the transcription factor (node) whose target genes were already annotated in the pathway. To this aim, ChIP-seq ENCODE datasets (important resource to improve pathway annotation) were used. During the 3° year, collaborating with Prof. Tell, I have worked on ChIP-seq and RIP-seq data analysis. First of all, I have generated a ChIP-seq data analysis pipeline with the aim to identify target genes directly regulated by APE1 during oxidative stress condition. The identification of target genes was performed by ChIP-seq analysis in order to identify APE1 preferential promoter binding sites. Then, by using RIP-seq data, I investigated the biological significance of the RNA bound by APE1 using several online tools. Gene Ontology analysis of biological functions was performed using DAVID online tool (https://david.ncifcrf.gov/). Interactions among proteins were identified by STRING (http://string-db.org). miRNA/mRNA targets were identified by data mining in miRGate (http://mirgate.bioinfo.cnio.es.). Another work I have been involved, aimed at studying the mef2a binding sites (common and exclusive) of GM12878 (lymphoid cell type) and K562 (myeloid cell type) cell lines by data mining process
1-apr-2016
NEXT GENERATION SEQUENCING; ChIP-seq, RIP-seq; Fraction Enrichment Score
Computational methods and pipelines for the analysis of next generation sequencing (NGS) data and pathway annotation / Fabrizio Serra - Udine. , 2016 Apr 01. 28. ciclo
File in questo prodotto:
File Dimensione Formato  
10990_678_TESI_DOTTORATO_FABRIZIO_SERRA.pdf

Open Access dal 02/10/2017

Tipologia: Tesi di dottorato
Licenza: Non specificato
Dimensione 5.12 MB
Formato Adobe PDF
5.12 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11390/1132211
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact