Open and semi-open response formats are part of almost every survey in the social, behavioural, educational, and economic sciences. In most cases, they serve to operationalise constructs that are too complex, e.g., due to their number of categories, to display them in a questionnaire. A typical example is information on occupational activity. The potential use of such information for quantitative analyses depends to a large extent on the subsequent classification of the text entries into suitable standard variables (e.g., KldB, ISCO). However, in view of the often hundreds or even thousands of categories, manual coding is not only time-consuming and resource-intensive, but also prone to errors.
CODI will establish an infrastructure for the efficient and quality-assured coding of textual information. The main focus is on regularly collected information on occupation and sectors as well as on education and training. An essential element of the infrastructure is a database-driven software tool that enables partially automated coding of open responses with algorithms. Designed to be user-friendly, CODI can be accessed via appropriate interfaces. The tool allows (multiple) coding of text responses to open-ended questions with the help of automatically generated suggestions, as well as validation and commenting or documentation of the process. CODI aims to contribute to better exploitation of additional re-use potential by enriching data collections. CODI is developed and operated at the Research Data Center of the Leibniz Institute for Educational Trajectories (LIfBi) in Bamberg.