The Pro-TEXT Corpus: A Generically Diverse Resource of Writing Keylogs and Finished Texts in French
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Edinburgh University Press
Abstract
This article introduces the Pro-TEXT corpus, a collection of product data in the form of finished texts and process data in the form keystroke logs written in French and annotated for rewriting operations, burst types, and linguistic features. The corpus consists of six sub-corpora (Academic, Adult, Children, Dyslexia, Professional, and Translation) and presents a unique example of openly available data generated by diverse writers of varying expertise in a range of domains and genres in French. The corpus can support research in the areas of school writing, academic writing, and professional writing. It can also support the teaching of writing, while informing teacher training. A web platform developed for the corpus allows users to query the corpus and visualize various features of real-time text production.
Description
This is the version of the accepted manuscript. The article is scheduled for publication in 2027. To appear in: Corpora issue 22.1 (early 2027)
Includes bibliographical references (pages 17-22)
Includes bibliographical references (pages 17-22)
Keywords
Citation
Cislaru, G., & Sfeir, M. (in press). The Pro-TEXT corpus: A generically diverse resource of writing keylogs and finished texts in French. Corpora. Accepted Manuscript.