Enhancing XML Preservation and Workflows

  • With the proliferation of computers and networked resources the amount of data available on personal computers and on the Web is growing exponentially. Handling of data becomes more complex as the scale expands. Moreover, data collections change in place over time, so that one of the important challenges consists in supporting a life-cycle of data. In particular, relevant parts of information have to be preserved and made accessible for future retrieval. XML is a wide-spread language for encoding documents that is both machine-readable and human-readable. It plays a unique role on the Web and in other areas of digital life: publishers, libraries, warehouses, technical writers, to name a few, - all make extensive use of XML for encoding books, articles, product information, documentation, interchanging and transformation of data, and the extraction and aggregation of relevant pieces of information. Contemporary web services are stepping away from traditional relational data models and are moving towards semi-structured data representations and utilization of NoSQL, and, in particular, XML databases for persisting data. A considerable technological stack has been built around XML, making it even stronger as a format. The development of XML technologies specifications is a never-ending process and emerging implementations of them push the progress further. However, there are still many open problems and challenges to be addressed in the XML domain. This thesis selects a number of the unsolved ones and suggests solutions; concretely (i) Versioning support for XML, (ii) XML databases views as counterparts of relational database views, (iii) Support for XML document templating. Solving the XML versioning problem for a subset of use cases enables an XML persistence layer that tracks the history of changes and provides XML database functionality like querying or indexing of data. The XML database views concepts allows to abstract away from the notion of XML files and think in terms of customizable and editable abstract XML entities with origins in some XML documents. Finally, support for XML document templating facilitates separation of responsibilities while authoring XML documents. Being able to use diverse expertise of developers in different phases of document creation, in turn, optimizes the whole authoring workflow. Highlighting the practical value of this work, all the concepts described in this thesis are implemented and integrated into the TNTBase system. Over the last 4 years TNTBase has been constantly utilized in a number of mainly research projects maturing by receiving feedback from its users. The main target of the TNTBase project was providing a versioned repository for the Open Mathematical Documents (OMDoc) with a strong focus on XML-related functionality. Since the OMDoc format combines data-like aspects (axioms, theorems, examples, etc.) with document-like aspects (sections, paragraphs, etc.), the number of applications was diverse, and therefore all the approaches have been generalized to make it possible to adapt TNTBase to other domains and XML languages. As a result, TNTBase has been deployed in multiple real-life scenarios where it has been used on a daily basis.

Download full text

Cite this publication

  • Export Bibtex
  • Export RIS

Citable URL (?):

Search for this publication

Search Google Scholar Search Catalog of German National Library Search OCLC WorldCat Search Catalog of GBV Common Library Network Search Catalog of Jacobs University Library Search Bielefeld Academic Search Engine
Meta data
Publishing Institution:IRC-Library, Information Resource Center der Jacobs University Bremen
Granting Institution:Jacobs Univ.
Author:Viacheslav Zholudev
Referee:Michael Kohlhase, Peter Baumann, Dieter Hutter
Advisor:Michael Kohlhase
Persistent Identifier (URN):urn:nbn:de:101:1-201307119503
Document Type:PhD Thesis
Date of Successful Oral Defense:2012/07/17
Year of Completion:2012
Date of First Publication:2013/05/14
PhD Degree:Computer Science
School:SES School of Engineering and Science
Library of Congress Classification:Q Science / QA Mathematics (incl. computer science) / QA71-90 Instruments and machines / QA75.5-76.95 Electronic computers. Computer science / QA76.75-76.765 Computer software
Call No:Thesis 2012/60

$Rev: 13581 $