Unstructured data are particularly valuable for research because they can map human life and behaviour in everyday life. Since the data are generated in an uncontrolled manner and do not have a fixed data format, this gives rise to specific sources of error that may differ from those of classic survey studies. In its handout, the German Data Forum draws attention to the challenges of collecting, processing, and analysing unstructured data. Students and researchers in the social, behavioural, and economic sciences are made aware of possible sources of error and can draw conclusions for their own work.
Total Error Framework (TEF) for Big Data as the basis for the handout
Based on the TEF for Big Data, a working group of the German Data Forum developed guiding questions on potential sources of error in scientific work with unstructured data. These questions were discussed at a workshop with experts from various disciplines. Based on the results, the German Data Forum describes the challenges in collecting and using unstructured data and creates a basis for quality standards.
The complete publication is available for free download on the German Data Forum website: https://www.konsortswd.de/en/publication/unstructured-data/
The German Data Forum (RatSWD) advises the federal government and the governments in the federal states on expanding and improving the research data infrastructure for the empirical social, behavioural and economic sciences since 2004. The German Data Forum (RatSWD) is made up of ten elected representatives from the social, behavioural, and economic disciplines who work together with ten representatives from key data producers.
The German Data Forum (RatSWD) is part of the Consortium for Social, Behavioural, Educational, and Economic sciences (KonsortSWD) in the National Research Data Infrastructure (NFDI). It acts as an institutionalised forum for dialogue between science and data producers, as well as developing recommendations and opinions. It is committed to supporting an infrastructure that enables sciences to have broad, flexible, and secure data access. These data are provided by state, science-based, and private-sector actors. The German Data Forum (RatSWD) has currently accredited 42 research data centres and encourages their cooperation.