E3C European Clinical Case Corpus

A project aiming at the creation of a corpus of clinical cases in 5 European languages


  • Our paper "The E3C Project: Collection and Annotation of a Multilingual Corpus of Clinical Cases" has been accepted for presentation at CLiC-it 2020, the Seventh Italian Conference on Computational Linguistics, Bologna, 1-3 March 2021

  • We will present E3C at the second annual ELG conference: META-FORUM 2020 - Piloting the European Language Grid, 1-3 December 2020


With the ambition of (i) making E3C a reference European corpus of clinical cases and (ii) boosting the clinical NLP scene for languages other than English, we will collect clinical cases for five languages, i.e. Italian, English, Spanish, French, and Basque, and will enrich them with interoperable annotations.

Clinical Case

Clinical cases are statements of a clinical practice, presenting the reason for a clinical visit, the description of physical exams, and the assessment of the situation of a patient. They are rich in clinical entities as well as temporal information, which are almost absent in other clinical documents (e.g. radiological reports). Focusing on clinical cases, which are often de-identified, also allows us to overcome privacy issues.


E3C will be annotated with clinical entities (e.g. symptoms and pathologies), temporal information, and factuality. Furthermore, E3C will be widely available and re-usable as we will build on top of resources that are distributed under public copyright licenses. E3C will thus stimulate researchers to work on clinical data on a shared benchmark in a multilingual setting, fostering task and technology development, including applications for metadata tagging and to support clinical predictions.