Cross-Lingual Textual Entailment (CLTE) is the task of identifying multi-directional entailment relations between two sentences, T1 and T2, written in different languages.
Each T1/T2 pair in the dataset is annotated (XML format) with one of the following entailment relations:
- Bidirectional (T1 ->T2 & T1 <- T2): the two fragments entail each other (semantic equivalence)
- Forward (T1 -> T2 & T1 !<- T2): unidirectional entailment from T1 to T2
- Backward (T1 !-> T2 & T1 <- T2): unidirectional entailment from T2 to T1
- No Entailment (T1 !-> T2 & T1 !<- T2): there is no entailment between T1 and T2
Both T1 and T2 are assumed to be TRUE statements; hence in the dataset there are no contradictory pairs.
The CLTE datasets have been created within the EU-funded project Cosyne (Multilingual Content Synchronizaton with Wikis).
Various CLTE datasets covering different language pairs are available.
Four language combinations are available, each containing 1,500 CLTE pairs:
Additionally, a monolingual English dataset is available as a by-product of the data collection methodology (1,500 pairs).
The CLTE-SemEval dataset is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Publications or presentations containing results obtained through the use of CLTE-SemEval should cite the following reference:
Matteo Negri, Luisa Bentivogli, Yashar Mehdad, Danilo Giampiccolo, and Alessandro Marchetti. 2011.
Divide and Conquer: Crowdsourcing the Creation of Cross-Lingual Textual Entailment Corpora.
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP 2011).