Time Event
11:00–11:15 Opening
11:15–12:15 Plenary Keynote: Think Inside the Box: Glass-box Evaluation Methods for Neural MT
by Marina Fomicheva, University of Sheffield
Most of the current SOTA approaches to automatic NLG evaluation treat the system under evaluation as a black box. In this talk I will present an alternative approach that looks inside the system to gain insights on the quality of generated outputs, using neural MT as an example. I will show how this idea can be applied to both reference-based and reference-free evaluation of MT outputs. As a side note, I will highlight some meta-evaluation aspects that affect the performance of automatic evaluation methods, but often go unnoticed in the large-scale evaluation campaigns.
[Slides] [Video]
12:15–12:50 Break
12:50–13:20 Elevator pitches for all papers
13:20–13:50 Poster session 1
- Automatic Machine Translation Evaluation in Many Languages via Zero-Shot Paraphrasing
- Studying the Effects of Cognitive Biases in Evaluation of Conversational Agents
13:50–14:20 Poster session 2
- On the interaction of automatic evaluation and task framing in headline style transfer
- Evaluating Semantic Accuracy of Data-to-Text Generation with Natural Language Inference
14:20–15:00 Break
15:00–16:00 Panel discussion with Q&A
Panelists: Marina Fomicheva, Yvette Graham, João Sedoc
The panelists will be discussing the current limits, as well as the future of NLG evaluation. What is currently missing in the evaluation of NLG systems? How can human and automated metrics be improved? These and more questions will be covered in our live panel, where we will also leave space for Q&A with the audience.
16:00–16:30 Poster session 3
- Informative Manual Evaluation of Machine Translation Output
- NUBIA: NeUral Based Interchangeability Assessor for Text Generation
- “This is a Problem, Don’t You Agree?” Framing and Bias in Human Evaluation for Natural Language Generation
16:30–16:50 Break
16:50–17:20 Poster session 4
- A proof of concept on triangular test evaluation for Natural Language Generation
- Evaluation rules! On the use of grammars and rule-based systems for NLG evaluation
- Evaluating AMR-to-English NLG Evaluation
17:20–18:20 General discussion, closing