Open and semi-open response formats are part of almost every survey in the social, behavioural, educational, and economic sciences. In most cases, they serve to operationalise constructs that are too complex, e.g., due to their number of categories, to display them in a questionnaire. A typical example is information on occupational activity. The potential use of such information for quantitative analyses depends to a large extent on the subsequent classification of the text entries into suitable standard variables (e.g., KldB, ISCO). However, in view of the often hundreds or even thousands of categories, manual coding is not only time-consuming and resource-intensive, but also prone to errors.
CODI will establish an infrastructure for the efficient and quality-assured coding of textual information. The main focus is on regularly collected information on occupation and sectors as well as on education and training. An essential element of the infrastructure is a database-driven software tool that enables partially automated coding of open responses with algorithms. Designed to be user-friendly, CODI can be accessed via appropriate interfaces. The tool allows (multiple) coding and validation with the help of automatically generated suggestions as well as commenting and process documentation. CODI aims to contribute to better exploitation of additional re-use potential by enriching data collections. CODI is developed and operated in cooperation with other partners at the Research Data Center of the Leibniz Institute for Educational Trajectories (LIfBi) in Bamberg.
A beta version of CODI will be made available in July 2022.