File formats

Using appropriate file formats and having robust conventions for file naming, versioning and organisation are crucial for guaranteeing that data can be accessed, used, shared and preserved in the long term.

It is recommended to use standard and interchangeable or open, lossless data formats in order to minimise the risk of data becoming inaccessible should the hardware or software environments they depend on becoming obsolete. Where possible, this should be considered at the time of data collection.

Alternatively, data may be converted once data analysis is complete and data are ready for storage, publishing and preserving. Conversion can often be achieved using a program's export or 'save as' functionality. 

Did you know?

The University of St Andrews' Apps Anywhere service also offers the free software tool Pandoc, which can be used to convert files.

Suitable file formats should be:

  • recognised and used commonly by the research community,
  • in a standard representation (ASCII, Unicode),
  • uncompressed,
  • unencrypted,
  • open and interchangeable (OpenDocument Format (ODF), tab-delimited, comma-separated values, XML),
  • suitable for extracting and viewing the data,
  • easy to annotate with metadata.

More information about open file formats can be found in several online resources, such as the UK Data Service, Open Data HandbookWikipedia and in a training module provided by the European Open Data Portal.

The National Archive's technical registry, PRONOM, also provides detailed information about individual file formats, software solutions and technical requirements for supporting long-term access to electronic records and other digital objects.