FILE FORMATS
Deciding on suitable formats for your research data at the outset of your research projects is crucial, as it dictates the data's potential uses, analyses, storage, and future reuse.
Contained within this table is guidance on recommended and accepted file formats, sourced from the following references: UK Data Service.
Type of Data | Recommended formats | Other acceptable formats |
Quantitative tabular data with extensive metadata. A dataset with variable labels, code labels, and defined missing values, in addition to the matrix of data. |
Proprietary formats of statistical packages e.g. SPSS (.sav), Stata (.dta), .sas7bdat. |
SPSS portable format (.por). MS Access (.mdb/.accdb). |
Quantitative tabular data with minimal metadata. A matrix of data with or without column headings or variable names, but no other metadata or labeling. |
Comma-separated values (CSV) file (.csv). |
Delimited text of given character set – only characters not present in the data may be used as delimiters (.txt). Widely-used formats: MS Excel (.xls/.xlsx), MS Access (.mdb/.accdb), OpenDocument Spreadsheet (.ods). |
Geospatial data. Vector and raster data. |
ESRI Shapefile (essential – .shp, .shx, .dbf, optional – .prj, .sbx, .sbn). Geo-referenced TIFF (.tif, .tfw). CAD data (.dwg). Tabular GIS attribute data. |
ESRI Geodatabase format (.mdb). |
Qualitative data. Textual. |
eXtensible Mark-up Language (XML) text according to an appropriate Document Type Definition (DTD) or schema (.xml). Rich Text Format (.rtf). Plain text data, ASCII (.txt). |
Hypertext Mark-up Language (.html). |
Digital image data. | TIFF version 6 uncompressed (.tif). Digital Imaging and Communications in Medicine (DICOM) (.dcm, .dcm30) – for CT/MRI data. |
JPEG (.jpeg, .jpg) but only if created in this format. TIFF (other versions) (.tif, .tiff). Adobe Portable Document Format (PDF/A, PDF) (.pdf). Standard applicable RAW image format (.raw). Photoshop files (.psd). BMP (.bmp) but only if created in this format. PNG (.png) but only if created in this format. |
Digital audio data. | Free Lossless Audio Codec (FLAC) (.flac). |
MPEG-1 Audio Layer 3 (.mp3) if original created in this format. |
Digital video data. | MPEG-4 (.mp4). OGG video (.ogv, .ogg). motion JPEG 2000 (.mj2). |
MOV (.mov) Windows Media Video (WMV) (.wmv). WebM (.webm). |
Documentation and scripts. |
Rich Text Format (.rtf). |
Plain text (.txt). |