The German Data Forum, in its latest report, addresses challenges in the exploitation of big data in science, and publishes its recommendations to researchers and (science) policy makers. There is a growing interest in big data use in social, behavioural, and economic sciences, all of which is becoming increasingly common in the age of digitized communication and consumer behaviour. Such data are characterised by large numbers of cases, non-influencing survey methods, as well as the possibility for real-time analysis and observation of social interactions. In order to tap into their scientific potential, researchers must consider a variety of legal and structural challenges – so long as they have access to the data.
Researchers, who have been granted individual access to company big data sources, are often still denied access to strategically important variables or observations. They cannot often pass on the data used to other researchers, and are at risk of losing data access before the end of a research project. Potential conflicts of interest or publication restrictions pose additional challenges in data use, and the re-use of data, e.g. for replication studies or independent questions, is mostly denied.
Many researchers are using web-scraping techniques to obtain data from the internet themselves. In these cases, software interfaces are used or websites are read in bulk. Because the use of these data collection methods is often accompanied by legal uncertainties, the German Data Forum (RatSWD) has obtained a legal opinion from the RobotRecht Research Centre of the University of Würzburg on relevant legal issues of data access, data re-use, and data archiving; it is published as part of the report.
The German Data Forum (RatSWD) recommends the establishment of independent trustee offices, in order to systematically and sustainably address the issue of missing access to big data. These offices should mediate between the interests of researchers and companies that hold big data. To accomplish this, trustees must accept data from companies and then enable researchers to carry out their independent data protection-compliant analysis. The German Data Forum (RatSWD) therefore welcomes the Federal government’s recently published data strategy assessment, stating that the concept of a trustee as a central instrument provides an “increase in the voluntary sharing of data.” The German Data Forum (RatSWD) recommends that the special features of research with regards to (personal) big data be taken into account when further developing the data strategy.