Go to main content

Using Big Data: German Data Forum (RatSWD) publishes recommendations

Against the backdrop of the increasing scientific importance of big data sources, the German Data Forum (RatSWD) examines the legal and structural challenges of their use and makes recommendations to researchers and politicians alike. Building on the legal opinions of the RobotRecht Research Centre, published as part of the report, the focus is on web-scraping processes.
In order to bring the interests of science and business together, independent trustees could make sustainable and improved data access available to researchers. Additionally, the distinctive features of research, with regards to personal big data, should be centred on the data strategy of the Federal government.

The German Data Forum, in its latest report, addresses challenges in the exploitation of big data in science, and publishes its recommendations to researchers and (science) policy makers. There is a growing interest in big data use in social, behavioural, and economic sciences, all of which is becoming increasingly common in the age of digitized communication and consumer behaviour. Such data are characterised by large numbers of cases, non-influencing survey methods, as well as the possibility for real-time analysis and observation of social interactions. In order to tap into their scientific potential, researchers must consider a variety of legal and structural challenges – so long as they have access to the data.

Researchers, who have been granted individual access to company big data sources, are often still denied access to strategically important variables or observations. They cannot often pass on the data used to other researchers, and are at risk of losing data access before the end of a research project. Potential conflicts of interest or publication restrictions pose additional challenges in data use, and the re-use of data, e.g. for replication studies or independent questions, is mostly denied.

Many researchers are using web-scraping techniques to obtain data from the internet themselves. In these cases, software interfaces are used or websites are read in bulk. Because the use of these data collection methods is often accompanied by legal uncertainties, the German Data Forum (RatSWD) has obtained a legal opinion from the RobotRecht Research Centre of the University of Würzburg on relevant legal issues of data access, data re-use, and data archiving; it is published as part of the report.

The German Data Forum (RatSWD) recommends the establishment of independent trustee offices, in order to systematically and sustainably address the issue of missing access to big data. These offices should mediate between the interests of researchers and companies that hold big data. To accomplish this, trustees must accept data from companies and then enable researchers to carry out their independent data protection-compliant analysis. The German Data Forum (RatSWD) therefore welcomes the Federal government’s recently published data strategy assessment, stating that the concept of a trustee as a central instrument provides an “increase in the voluntary sharing of data.” The German Data Forum (RatSWD) recommends that the special features of research with regards to (personal) big data be taken into account when further developing the data strategy.