Call for papers

Access to sufficient amounts of data and their quantification, in order to detect the emergence of new variants as precisely as possible, and the recession or even disappearance of others, is a precious tool for the study of variations, whatever their dimensions (diachronic, diatopic…) and their field (syntax, morphology…). The appearance of large corpora has thus renewed the study of variation. NLP has contributed largely to this renewal, providing tools for the enrichment (morphological taggers and syntactic parsers) and the exploration of these corpora.

In return, when linguistic analysis cannot directly help improve the performances of these tools, via annotation error analysis for example, can help explain some of these errors (Brigada Villa et Giarda 2023, Manning 2011) and thus deepen the picture where performance metrics tend to flatten out everything under a single number. NLP annotation tools, such as syntactic parsers and morphological taggers, reach great performances nowadays when they are applied on similar data to those seen during their development. However, they quickly drop as the target data diverges from those of the training scenario (Dereza et al. 2023, Manjavacas et Fonteyn 2022). This raises a number of issues when it comes to using automatically annotated data to perform linguistic studies (Beck et Köllner 2023, Faria 2014, Säily et al. 2011).

We invite article submissions on all relevant topics, including but not limited to:

  • Quantification of variation along its different dimensions (both external and internal ones as well as in interaction with each other) ;
  • Impact of annotation errors on the study of marginal structures (emergent or recessing) ;
  • Syntactic variation when it is induced by semantic changes.
  • Variation mitigation (spelling standardisation…)
  • Domain adaptation (domain referring here to any variation dimension) ;
  • Error analysis (in and out of domain) in light of known variation phenomena;
  • The evolution of grammatical categories and its impact on prediction models.
  • The place of variation studies in NLP in the large language model era.

Submission

Submissions take the form of abstracts of up to 500 word (not including references) abstracts. They must be fully anonymous and are to be uploaded at https://easychair.org/my/conference?conf=llcd2024.

References

  • Beck, C., & Köllner, M. (2023). « GHisBERT – Training BERT from scratch for lexical semantic investigations across historical German language stages ». Proceedings of the 4th Workshop on Computational Approaches to Historical Language Change.
  • Brigada Villa, L., & Giarda, M. (2023). « Using Modern Languages to Parse Ancient Ones: a Test on Old English ». Proceedings of the 5th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, 30-41.
  • Dereza, O., Fransen, T., & Mccrae, J. (2023). « Temporal Domain Adaptation for Historical Irish ». In Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023).
  • Faria, P. (2014). « Using Dominance Chains to Detect Annotation Variants in Parsed Corpora ». 2014 IEEE 10th International Conference on e-Science, 2, 25-32.
  • Manjavacas, Enrique, Lauren Fonteyn (2022). « Adapting vs. pre-training language models for historical languages ». Journal of Data Mining & Digital Humanities, pages 1–19. Manning, C.D. (2011). Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics?. Computational Linguistics and Intelligent Text Processing. CICLing 2011.
  • Säily, T., Nevalainen, T., & Siirtola, H. (2011). « Variation in noun and pronoun frequencies in a sociohistorical corpus of English ». Literary and Linguistic Computing, 26(2), 167-188.