Data integration in microbial genomics Contextualizing sequence data in aid of biological knowledge

  • Deoxyribonucleic acid (DNA) is the primary structure that carries the genetic information of organisms in genomes. The introduction of the first DNA sequencing methods in 1977 marked a major breakthrough in life sciences. Today, these methods are widely applied and grant insight into the 'blueprints' of organisms from all domains of life. The analysis of environmental microbial sequence data is becoming increasingly important in times of global climate change, because microbes are central catalysts in nutrient cycles such as the carbon cycle that profoundly a ffects Earth's climate. Microbes perform almost all metabolic processes that are thermodynamically possible. DNA sequencing is carried out around the globe and the resulting data is submitted to the public repositories of the International Nucleotide Sequence Database Collaboration (INSDC). Data in the INSDC is accumulating exponentially. This trend shows the need for efficient data processing strategies in order to gain knowledge out of this ever increasing amount of sequence data. For this, it is important to annotate sequence data with as much contextual data as possible. Contextual data are data about the environmental context and the processing steps that were applied. These can range from data about the geographic location, sampling time, habitat, or about experimental procedures used to obtain the sequences up to video data recorded during sampling. Especially data about the geographic location (x, y, z) and the point in time (t), when samples are taken from the environment are essential. Comparability and interpretability are preserved. Ample analysis approaches become possible, when contextual and sequence data are integrated. In this doctoral thesis, data integration is promoted in three ways: Firstly, through the development of contextual data capture, submission and integration tools. Secondly, through the development of standards for contextual data and thirdly, through demonstration of in silico hypothesis generation for a large metagenomic data set.

Download full text

Cite this publication

  • Export Bibtex
  • Export RIS

Citable URL (?):

Search for this publication

Search Google Scholar Search Catalog of German National Library Search OCLC WorldCat Search Catalog of GBV Common Library Network Search Catalog of Jacobs University Library Search Bielefeld Academic Search Engine
Meta data
Publishing Institution:IRC-Library, Information Resource Center der Jacobs University Bremen
Granting Institution:Jacobs Univ.
Author:Wolfgang Matthias Hankeln
Referee:Frank Oliver Glöckner, Peter Baumann, Wolfgang Ludwig
Advisor:Frank Oliver Glöckner
Persistent Identifier (URN):urn:nbn:de:101:1-2013052811990
Document Type:PhD Thesis
Date of Successful Oral Defense:2011/05/30
Date of First Publication:2011/07/04
PhD Degree:Bioinformatics
School:SES School of Engineering and Science
Other Organisations Involved:Max-Planck-Institut für Marine Mikrobiologie (marmic)
Library of Congress Classification:Q Science / QH Natural history - Biology / QH301-705.5 Biology (General) / QH324 Methods of research. Technique. Experimental biology / QH324.2 Data processing. Bioinformatics
Call No:Thesis 2011/25

$Rev: 13581 $