At their 21st session, the Standing Committee Research Data Infrastructure (FDI Committee) has discussed the most recent Activities Report of the 34 research data centres (RDCs) accredited by the German Data Forum (RatSWD). Key figures from the report show: there is strong demand for the data offered by the RDCs. In 2018, RDCs counted 46,464 external data users in total and 9,081 new users. During that year, not only was the data supply expanded methodologically and thematically, it also grew quantitatively by 369 new datasets. 2,074 publications in 2018 were based fully or in part on the altogether 3,940 datasets. Besides using the direct data access at the 68 RDC locations in Germany, researchers downloaded the data 71,488 times. Using data remains free of charge at almost all RDCs.
A staff of 285 full time-equivalents worked at the RDCs as of 31 December 2018. From this number, 60 percent were scientific employees, who – in addition to their tasks in user assistance and data curation – wrote a total of 528 scientific publications. This is a distinguishing number in several respects: by (re-) using the RDCs’ own data, their staffs remain up-to-date on the methodological state of the art, improve their research counselling, and at the same time control the data quality.
The key figures below are based on the annual monitoring, which ensures continual quality assurance of the research data infrastructure and shows its development. The monitoring results in the regular Activities Report of all RDCs accredited by the German Data Forum (RatSWD). The Activities Report 2018 can be retrieved here.
At their session, the FDI Committee also exchanged views with the computer scientist Luc Rocher from the Université catholique de Louvain in Belgium. His journal article on the potential of re-identifications in incomplete datasets attracted much interest in the international press. The discussion at the session reaffirmed that the RDCs currently make re-identification in the datasets they offer de facto impossible. They achieve this through carefully regulated data access paths and robust anonymisation and pseudonymisation methods. That way, especially sensitive data, such as names, birth dates, or postal codes, are routinely removed or coarsened by a large degree.
At the same time, the RDCs welcome the research by Luc Rocher and his colleagues, which makes an important contribution to the assessment of risk during publication of research data. Research on anonymisation and the risks of de-anonymisation should be intensified in the coming years to ensure transparency for researchers and study participants.