Go to main content

Open data format

Open, non-proprietary format for the exchange of research data and metadata for use with popular statistical programs.

The “Open Data Format” offers an innovative solution for data processing and data exchange in research. The aim of the project is to develop an open, non-proprietary, multilingual data format that is enriched with additional information and can be used in common statistical programs. The enriched metadata contributes to optimizing the research process and fulfilling the requirements of the FAIR principles.

For Research Data Centers (RDC)

The open data format provides an efficient way to create a unified data format that can be used by a wide range of users, facilitated by easily available and installable import filters. This avoids the necessity for data producers to create multiple formats for different user requirements, consequently guaranteeing optimized data processing. Moreover, there’s the possibility to enrich the data with supplementary information, which was previously difficult to realize due to software-specific limitations, thereby enhancing data documentation. The format also offers benefits for long-term archiving, as the data can be used independently of proprietary software.

For Researchers

The “Open Data Format” enables data users to process and analyze data in various software environments, offering efficient and flexible workflows without depending on proprietary software. Data users do not need to change their usage habits. Additionally, more information than usually provided through conventional data formats is available through the Open Data Format, such as links to data portals that can be directly accessed via statistical software. With the adoption of the Open Data Format, data users gain new opportunities to access a broader range of datasets

  • R Package is available: https://git.soep.de/opendata/r-package-opendataformat
    It provides the capability to import the Open Data Format into an R data frame and export data from an R data frame to the Open Data Format. Additionally, users can effortlessly access detailed metadata information about the dataset and variables using either the RStudio Viewer or a web browser. This intuitive approach ensures seamless exploration and utilization of dataset information within the user’s preferred environment.
  • Stata Package is available: https://thartl-diw.github.io/opendf
    The Stata package is specifically designed to facilitate the seamless utilization of the Open Data Format within the Stata environment. Similar to the R package, it enables importing data from the Open Data Format into a Stata data frame and saving a Stata data frame (.dta) to the Open Data Format. Additionally, users can easily access metadata information at both the dataset and variable levels.
  • Use Case: SOEP Data in Open Data Format
    The SOEP-Core is the centerpiece of the Socio-Economic Panel, a comprehensive longitudinal study of private households in Germany conducted by the German Institute for Economic Research (DIW Berlin). Recently, the Scientific Use Files of the Socio-Economic Panel (SOEP) have become available in the new Open Data Format (opendf). Instructions on how to use SOEP data in opendf can be found here: Working with SOEP Data in Open Data Format — SOEPcompanion 1.0.0 Documentation.

More information