The Pro-TEXT Corpus: A Generically Diverse Resource of Writing Keylogs and Finished Texts in French

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Edinburgh University Press

Abstract

This article introduces the Pro-TEXT corpus, a collection of product data in the form of finished texts and process data in the form keystroke logs written in French and annotated for rewriting operations, burst types, and linguistic features. The corpus consists of six sub-corpora (Academic, Adult, Children, Dyslexia, Professional, and Translation) and presents a unique example of openly available data generated by diverse writers of varying expertise in a range of domains and genres in French. The corpus can support research in the areas of school writing, academic writing, and professional writing. It can also support the teaching of writing, while informing teacher training. A web platform developed for the corpus allows users to query the corpus and visualize various features of real-time text production.

Description

This is the version of the accepted manuscript. The article is scheduled for publication in 2027. To appear in: Corpora issue 22.1 (early 2027)
Includes bibliographical references (pages 17-22)

Keywords

Citation

Cislaru, G., & Sfeir, M. (in press). The Pro-TEXT corpus: A generically diverse resource of writing keylogs and finished texts in French. Corpora. Accepted Manuscript.

Endorsement

Review

Supplemented By

Referenced By