Go to main content

CODI – Automated Coding of Open Responses

Automated and efficient coding of text responses to open-ended questions into standardised, quality assured categories.

Open and semi-open response formats in (social) scientific surveys mainly serve to operationalize constructs that cannot be meaningfully measured by predefined response options in the questionnaire due to categories that are either too unspecific or too numerous. A typical example are details on occupational activities. The CODI tool developed by the Research Data Center at the Leibniz Institute for Educational Trajectories (RDC LIfBi, Bamberg) provides an infrastructure for automated ad hoc coding of such textual information.

CODI for Researchers

In numerous quantitative surveys, researchers need to implement questions with open or semi-open response formats. The analysis potential of the information collected in this way depends to a large degree on the subsequent classification of the text entries into corresponding standard variables (e.g. KldB, ISCO, ISEI for occupations). The central “translation process” is the coding of the data available in text form into a suitable category scheme. Considering schemas with hundreds or even thousands of categories, manual coding is not only an enormously time-consuming and resource-intensive process, but also error-prone. The web-based and database-driven CODI tool offers researchers a way to make this coding more efficient by using semi-automated algorithms. For (multiple) coding of text responses, suggestions are automatically generated according to a pre-configured coding guideline and the available material in the database. In addition, the assigned codes can be validated within the tool for quality assurance purposes and a documentation of the coding process can be created.

CODI for Research Data Centers

Numerous Research Data Centers (RDC) enable the scientific community to reuse data resources from large-scale, quantitative (panel) studies. Some of these data collections are prepared and documented at the RDCs themselves, while others are handed over by the primary researchers to the RDCs for curation and provision. If these data contain openly queried information, the CODI tool can be used to subsequently convert the text entries into suitable codes and to derive appropriate standard classifications. Through the interface to an already existing or a specifically established database, even large amounts of open information can be coded efficiently and reproducibly by the RDCs using semi-automated routines. In this respect, CODI contributes to the enrichment of data resources and thus to a better exploitation of the analysis potentials.