Linking textual data

Linking textual data with other types of data.

Whether textual data or digital behavioural data: so-called “new data” allows to address questions that “established data” such as survey data or official statistics can only answer to a limited extent. At the same time, the full potential of this data can only be realised if established and new data are efficiently and robustly linked to each other. Linking data can thus provide new insights.

Often, established data is still disconnected from new data, which is often unstructured. This is where the open-source R package LinkTools comes in. LinkTools takes the needs and pitfalls of linking textual data with established data in the social sciences into account. Using unique identifiers, LinkTools enables the linking of metadata in documents and entities in continuous text with external data. For example, in the case of metadata, speakers in the German parliament can be assigned to an electoral district or locations in a continuous text can be recognised, differentiated into unique concepts, and enriched with external information. Such links can be used, for example, to investigate questions about the relationship between the representation of an issue in public debate and surveys.

The package, which is currently in an early stage of development, is being openly developed on GitHub. The GitHub repository contains the code and further information on installing and using the package.

Further related R-packages:
DBpedia
polmineR
cwbtools
RcppCWB