We use cookies to give you the best experience and to help improve our website
Find out what cookies we use and how to disable themThis document defines the fundamentals of text documents for long-term preservation covering the concept, elements and components of text documents.
To preserve text document files created with formats such as ODT, DOC and RTF, etc. over time, they typically are converted into PDF/A.
However, since PDF has originally been developed as a format for printing, only visual information excluding format elements such as titles and tables, etc. would be preserved, which might lead to eliminating unique characteristics of a text document (e.g. content elements such as multi-columns, charts, hidden-description, context information, metadata) in the process of converting the text document into a PDF.
Therefore, text documents containing information that needs to be parsed and/or that needs to be programmatically processed should be preserved in the original text document file format without being converted.
However, when it comes to preserving text documents themselves through time, it might be difficult to maintain their original characteristics and to deal with technical obsolescence due to the reasons below.
- Content elements and types of text documents can be visually variable, therefore, even with the same file, the layout of the text document would look different depending on the software.
- Some text documents might not include metadata. Even in the case of including, it might not be sufficient to represent their contextual information properly.
Thereupon, many countries and institutions have tried to find ways to select proper text document formats for long-term preservation e.g. Library of congress, Arts and Humanities Data Service, Digital Preservation Coalition, and British Library. However, it is still a difficult problem to solve for some reasons below.
- Elements, composition and components of text documents for long-term preservation are somewhat not specific and insufficient in definition.
- It is difficult to be sure whether it is possible to maintain the original characteristics of the text documents over time and to deal with technical obsolescence.
- There are neither reference models nor assessment models for the long-term preservation of text documents.
To solve this problem, this document presents guidelines for long-term preservation of text documents with the standardizations for;
- What are the elements and components necessary for long-term preservation of text documents?
- What are the requirements for the long-term preservation of text documents?
- What are the logical reference models (visual, semantic, structural layers) and physical reference models (format specifications) for the long-term preservation of text documents?
- How to carry out the conformity assessment for text documents for the trustworthiness of their longterm preservation?
What can be expected through these standardizations are below. -
Providing directions for supplement and assessment of long-term preservation of existing text document formats.
- Using as criteria for selecting file formats of text documents for long-term preservation.
- Using for designing an archiving system, ECM system and/or records systems, etc, for improving the useability of text documents.
You are now following this standard. Weekly digest emails will be sent to update you on the following activities:
You can manage your follow preferences from your Account. Please check your mailbox junk folder if you don't receive the weekly email.
You have successfully unsubscribed from weekly updates for this standard.
Comment on proposal
Required form fields are indicated by an asterisk (*) character.