3. Data Life Cycle

Download PDF

3.5 Archives

3.5.1 Formats

Format refers to the way data is organized in an electronic file. Saving data in sustainable standard formats is a major data management issue. Well preserved data will always be available to users regardless of the way accessing tools are evolving.

DATA ARCHIVING:

  • save data using non proprietary open formats to allow data to be accessible independantly of software types and versions;
  • use clear identifiers for data files;
  • define an archiving strategy including the use of two different physical formats Ex.: hard disk and CD stored in two different physical locations;
  • scan paper documentation and save it in portable file format (ex.: PDF/A);
  • ensure data archiving location (numerical and non-numerical) is appropriate and safe;
  • copy data to new support (magnetic tape, disk, etc.) 2 to 5 years after their creation as certain media can deteriorate;
  • check archived data integrity regularly.

For instance, ISO standard 19005 proposes the use of PDF/A as a long term data conservation format1. ISO and the IEC (International Electrotechnical Commission) have also approved the use of open formats such as ODF2, designated ISO/IEC 26300, in order to ensure interoperability and efficient data access. This standard is used namely for "Office Suite" type of documents including texts, spreadsheets, presentations and graphical elements.

The US Government Library of Congress has adopted a comprehensive approach to format issues including the development of a detailed decision process with respect to data preservation and file formats.3

Image and video files have also been examined by many. For instance, Nozères (2011) has done an in-depth analysis providing detailed descriptions of best practices in terms of image/video file formats, geotagging, metadata and archiving options.4

FILE NAMING:

  • use meaningful file names [E.g.: 2012-Tadoussac-beluga009.xls for beluga biological data file 009 collected in 2012 in the Tadoussac area]
  • avoid using spaces, accents and special characters such as $/%& @!
  • avoid very longs file names [E.g.: 2012-08-19 Tadoussac bio samples beluga whales JeanLambert.xls]
  1. ISO. 2009. ISO 19005-1:2005 Document management - Electronic document file format for long-term preservation -- Part 1: Use of PDF 1.4 (PDF/A-1). http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=38920
  2. Open Document Format (ODF). http://www.opendocumentformat.org/aboutODF
  3. US Government – Library of Congress. Digital Preservation. http://www.digitalpreservation.gov/formats/intro/format_eval_rel.shtml
  4. Nozères, C. 2011. Gestion des données d'images en sciences aquatiques: une introduction aux bonnes pratiques et aux flux de travail. Rapp. Tech. can. Sci. halieut. Aquat. 2962F: xiv + 195 p. http://www.dfo-mpo.gc.ca/Library/345814.pdf