News: The workshop is now over, but you can have a look at all presented papers, as well as a video recoding and slides from Marina Fomicheva’s keynote.

Workshop overview:

This workshop is intended as a discussion platform on the status and the future of the evaluation of Natural Language Generation systems. Among other topics, we will discuss current evaluation quality, human versus automated metrics, and the development of shared tasks for NLG evaluation. The workshop also involves an ‘unshared task’, where participants are invited to experiment with evaluation data from earlier shared tasks.

Important Dates:

  • First Call for papers released: July 20, 2020
  • Second Call for papers released: September 16, 2020
  • Submission deadline: Extended to October 19, 2020 (UTC-12) – Submission is open
  • Notification of acceptance: November 15, 2020
  • INLG 2020 registration – the workshop is included in main conference registration, workshop-only registration: €50.
  • Camera ready papers due: November 30, 2020
  • Workshop: December 18, 2020 (adjacent to INLG)


We encourage a range of papers ranging from commentary and meta-evaluation of existing evaluation strategies to the suggestion of new metrics. We specifically place emphasis on the methodology and linguistic aspects of evaluation rather than the proposal of new automatic metrics. We invite papers on any topic related to the evaluation of NLG systems, including (but not limited to):

  • Qualitative studies, definitions of evaluation metrics (e.g., readability, fluency, semantic correctness)
  • Crowdsourcing Strategies, qualitative tests for crowdsourcing (How to elucidate evaluation metrics?)
  • Looking at individual differences and cognitive biases in human evaluation (expert vs. non-expert, L1 vs L2 speakers)
  • Best practices for system evaluations (How does your lab choose models?)
  • Qualitative study/error analysis approaches
  • Demo: Systems that make the evaluation easier
  • Comparison of metrics across different NLG tasks (captioning, data2text, story generation, summarization…) or different languages (with a focus on low-resource languages)
  • Evaluation surveys
  • Opinion pieces and commentary on trends in evaluation

Task proposals that were submitted for the Generation Challenges at INLG 2020, may also be submitted to this workshop. If accepted for the main conference, they may still be presented at the workshop as non-archival papers. We also encourage the submission of more preliminary work, discussing the main challenges in setting up a shared task in the NLG domain.

Unshared Task:

This year’s edition also features an unshared task: rather than working towards a specific goal, we encourage participants to use a specific collection of datasets, for any evaluation-related goal. For example: comparing a new evaluation method with existing ratings, or carrying out a subset analysis. This allows us to put the results from previous shared tasks in perspective, and helps us develop better evaluation metrics for future shared tasks. Working on the same datasets allows for more focused discussions at the workshop.

Datasets for this year’s edition are existing datasets with system outputs and human ratings. Participants may use any of these for their unshared task submission:

Submission Formats:

  • Archival papers (up to 8 pages, excluding references; shorter submissions are also welcome)
  • Non-archival abstract of papers within the topic accepted somewhere else or under submission at the main INLG 2020 (1-2 pages, excluding references)
  • Demo papers (1-2 pages, excluding references)

Submission instructions:

The workshop follows INLG 2020 submission instructions, i.e. all submissions should follow ACL Author Guidelines and policies for submission, review and citation, and be anonymised for double blind reviewing. The papers should follow ACL 2020 style – LaTeX style files and Microsoft Word templates and an Overleaf template are provided by ACL 2020.

Papers should be submitted electronically through the EasyChair conference management system.


Shubham Agarwal, Ondrej Dusek, Sebastian Gehrmann, Dimitra Gkatzia, Ioannis Konstas, Emiel van Miltenburg, Sashank Santhanam, Samira Shaikh