Challenges in Integration and Analysis of High-Dimensional Biological Data: Cases from Environmental and Health Research

  • Biological data represent a large, challenging sector of data engineering applications. Biological data are typically complex and poorly standardized. Moreover, high value, rapid growth in volume and advances in acquisition technologies characterize modern environmental and health research data, humbling the classical practices for data transformation and analytics. Furthermore, data in biology make more sense when integrated with usually different data types, or data from different sources or even fields. In addition, the uniqueness of each case and research question call for a deep understanding of data life cycle and for customized solutions. Having a large volume and value, and being produced at a high velocity in a large variety, biological data encourage the investigation of scalable workflows to automate acquisition and integration, closing the gaps in optimizing analytics specially for heterogeneous data. This thesis aims at exploring and optimizing the state-of-the-art methods for heterogeneous data integration and analysis, of sequence and non-sequence-based data, by identifying four areas of application concerning primary and secondary data from environmental and health research. It presents four challenges in data preparation and transformation for variable selection, and accompanying case studies. Particularly, the thesis investigates knowledge extraction from primary inherently high-dimensional marine sequence data, scalability in handling secondary photosynthetic sequence data, integration and statistical modeling of secondary high-dimensional relational health care claims data for adverse drug event prediction, and integration of heterogeneous primary epidemiological data for childhood obesity investigation. The thesis highlights the importance of data model development for data transformation and integration, and the role of scalable analytics in the foreseen increase in data dimensions.

Download full text

Cite this publication

  • Export Bibtex
  • Export RIS

Citable URL (?):

Search for this publication

Search Google Scholar Search Catalog of German National Library Search OCLC WorldCat Search Catalog of GBV Common Library Network Search Catalog of Jacobs University Library Search Bielefeld Academic Search Engine
Meta data
Publishing Institution:IRC-Library, Information Resource Center der Jacobs University Bremen
Granting Institution:Jacobs Univ.
Author:Mariam Reyad Rizkallah
Referee:Frank Oliver Glöckner, Iris Pigeot
Advisor:Adalbert F.X. Wilhelm
Persistent Identifier (URN):urn:nbn:de:gbv:579-opus-1010191
Document Type:PhD Thesis
Language:English
Date of Successful Oral Defense:2022/01/31
Date of First Publication:2022/03/21
Academic Department:Computer Science & Electrical Engineering
Focus Area:Mobility
PhD Degree:Data Engineering
Call No:2022/3

$Rev: 13581 $