Login / Signup
Eval4NLP
2020
2023
2020
2023
Keyphrases
Publications
2023
Daniil Larionov
,
Vasiliy Viskov
,
George Kokush
,
Alexander Panchenko
,
Steffen Eger
Team NLLG submission for Eval4NLP 2023 Shared Task: Retrieval-Augmented In-Context Learning for NLG Evaluation.
Eval4NLP
(2023)
Rui Zhang
,
Fuhai Song
,
Hui Huang
,
Jinghao Yuan
,
Muyun Yang
,
Tiejun Zhao
HIT-MI&T Lab's Submission to Eval4NLP 2023 Shared Task.
Eval4NLP
(2023)
Abhishek Pradhan
,
Ketan Kumar Todi
Understanding Large Language Model Based Metrics for Text Summarization.
Eval4NLP
(2023)
Yuan Lu
,
Yu-Ting Lin
Characterised LLMs Affect its Evaluation of Summary and Translation.
Eval4NLP
(2023)
Savita Bhat
,
Vasudeva Varma
Large Language Models As Annotators: A Preliminary Evaluation For Annotating Low-Resource Language Content.
Eval4NLP
(2023)
Yanran Chen
,
Steffen Eger
Transformers Go for the LOLs: Generating (Humourous) Titles from Scientific Abstracts End-to-End.
Eval4NLP
(2023)
Joonghoon Kim
,
Sangmin Lee
,
Seung Hun Han
,
Saeran Park
,
Jiyoon Lee
,
Kiyoon Jeong
,
Pilsung Kang
Which is better? Exploring Prompting Strategy For LLM-based Metrics.
Eval4NLP
(2023)
Proceedings of the 4th Workshop on Evaluation and Comparison of NLP Systems, Eval4NLP 2023, Bali, Indonesia, November 1, 2023
Eval4NLP
(2023)
Vatsal Raina
,
Adian Liusie
,
Mark J. F. Gales
Assessing Distractors in Multiple-Choice Tests.
Eval4NLP
(2023)
Zahra Kolagar
,
Sebastian Steindl
,
Alessandra Zarcone
EduQuick: A Dataset Toward Evaluating Summarization of Informal Educational Content for Social Media.
Eval4NLP
(2023)
Lukas Weber
,
Krishnan Jothi Ramalingam
,
Matthias Beyer
,
Axel Zimmermann
WRF: Weighted Rouge-F1 Metric for Entity Recognition.
Eval4NLP
(2023)
Neema Kotonya
,
Saran Krishnasamy
,
Joel R. Tetreault
,
Alejandro Jaimes
Little Giants: Exploring the Potential of Small LLMs as Evaluation Metrics in Summarization in the Eval4NLP 2023 Shared Task.
Eval4NLP
(2023)
Christoph Leiter
,
Juri Opitz
,
Daniel Deutsch
,
Yang Gao
,
Rotem Dror
,
Steffen Eger
The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics.
Eval4NLP
(2023)
Jad Doughman
,
Shady Shehata
,
Leen Al Qadi
,
Youssef Nafea
,
Fakhri Karray
Can a Prediction's Rank Offer a More Accurate Quantification of Bias? A Case Study Measuring Sexism in Debiased Language Models.
Eval4NLP
(2023)
Yixuan Wang
,
Qingyan Chen
,
Duygu Ataman
Delving into Evaluation Metrics for Generation: A Thorough Assessment of How Metrics Generalize to Rephrasing Across Languages.
Eval4NLP
(2023)
Ghazaleh Mahmoudi
Exploring Prompting Large Language Models as Explainable Metrics.
Eval4NLP
(2023)
Jeremy Block
,
Yu-Peng Chen
,
Abhilash Budharapu
,
Lisa Anthony
,
Bonnie J. Dorr
Summary Cycles: Exploring the Impact of Prompt Engineering on Large Language Models' Interaction with Interaction Log Information.
Eval4NLP
(2023)
Pavan Baswani
,
Ananya Mukherjee
,
Manish Shrivastava
LTRC_IIITH's 2023 Submission for Prompting Large Language Models as Explainable Metrics Task.
Eval4NLP
(2023)
Nitin Ramrakhiyani
,
Vasudeva Varma
,
Girish K. Palshikar
,
Sachin Pawar
Zero-shot Probing of Pretrained Language Models for Geography Knowledge.
Eval4NLP
(2023)
Abbas Akkasi
,
Kathleen C. Fraser
,
Majid Komeili
Reference-Free Summarization Evaluation with Large Language Models.
Eval4NLP
(2023)
2022
Yunmeng Li
,
Jun Suzuki
,
Makoto Morishita
,
Kaori Abe
,
Ryoko Tokuhisa
,
Ana Brassard
,
Kentaro Inui
Chat Translation Error Detection for Assisting Cross-lingual Communications.
Eval4NLP
(2022)
Shohei Zhou
,
Alisha Zachariah
,
Devin Conathan
,
Jeffery Kline
Assessing Resource-Performance Trade-off of Natural Language Models using Data Envelopment Analysis.
Eval4NLP
(2022)
Parush Gera
,
Tempestt J. Neal
A Comparative Analysis of Stance Detection Approaches and Datasets.
Eval4NLP
(2022)
Proceedings of the 3rd Workshop on Evaluation and Comparison of NLP Systems, Eval4NLP 2022, Online, November 20, 2022
Eval4NLP
(2022)
Guanyi Chen
,
Fahime Same
,
Kees van Deemter
Assessing Neural Referential Form Selectors on a Realistic Multilingual Dataset.
Eval4NLP
(2022)
Shohei Higashiyama
,
Masao Ideuchi
,
Masao Utiyama
,
Yoshiaki Oida
,
Eiichiro Sumita
A Japanese Corpus of Many Specialized Domains for Word Segmentation and Part-of-Speech Tagging.
Eval4NLP
(2022)
Ryan Chi
,
Nathan Kim
,
Patrick Liu
,
Zander Lack
,
Ethan A. Chi
GLARE: Generative Left-to-right AdversaRial Examples.
Eval4NLP
(2022)
Roberta Rocca
,
Alejandro de la Vega
Evaluating the role of non-lexical markers in GPT-2's language modeling behavior.
Eval4NLP
(2022)
Juri Opitz
,
Anette Frank
Better Smatch = Better Parser? AMR evaluation is not so simple anymore.
Eval4NLP
(2022)
Zhengxiang Wang
Random Text Perturbations Work, but not Always.
Eval4NLP
(2022)
Mateusz Krubi'nski
,
Pavel Pecina
From COMET to COMES - Can Summary Evaluation Benefit from Translation Evaluation?
Eval4NLP
(2022)
Kaori Abe
,
Sho Yokoi
,
Tomoyuki Kajiwara
,
Kentaro Inui
Why is sentence similarity benchmark not predictive of application-oriented task performance?
Eval4NLP
(2022)
2021
Oskar Wysocki
,
Malina Florea
,
Dónal Landers
,
André Freitas
What is SemEval evaluating? A Systematic Analysis of Evaluation Campaigns in NLP.
Eval4NLP
(2021)
Benjamin Murauer
,
Günther Specht
Developing a Benchmark for Reducing Data Bias in Authorship Attribution.
Eval4NLP
(2021)
Heather Lent
,
Semih Yavuz
,
Tao Yu
,
Tong Niu
,
Yingbo Zhou
,
Dragomir Radev
,
Xi Victoria Lin
Testing Cross-Database Semantic Parsers With Canonical Utterances.
Eval4NLP
(2021)
Vivek Srivastava
,
Mayank Singh
HinGE: A Dataset for Generation and Evaluation of Code-Mixed Hinglish Text.
Eval4NLP
(2021)
Yang Liu
,
Alan Medlar
,
Dorota Glowacka
Statistically Significant Detection of Semantic Shifts using Contextual Word Embeddings.
Eval4NLP
(2021)
Qingkai Zeng
,
Mengxia Yu
,
Wenhao Yu
,
Tianwen Jiang
,
Meng Jiang
Validating Label Consistency in NER Data Annotation.
Eval4NLP
(2021)
Urja Khurana
,
Eric T. Nalisnick
,
Antske Fokkens
How Emotionally Stable is ALBERT? Testing Robustness with Stochastic Weight Averaging on a Sentiment Analysis Task.
Eval4NLP
(2021)
Peter Polák
,
Muskaan Singh
,
Ondrej Bojar
Explainable Quality Estimation: CUNI Eval4NLP Submission.
Eval4NLP
(2021)
Chester Palen-Michel
,
Nolan Holley
,
Constantine Lignos
SeqScore: Addressing Barriers to Reproducible Named Entity Recognition Evaluation.
Eval4NLP
(2021)
Marina Fomicheva
,
Piyawat Lertvittayakumjorn
,
Wei Zhao
,
Steffen Eger
,
Yang Gao
The Eval4NLP Shared Task on Explainable Quality Estimation: Overview and Results.
Eval4NLP
(2021)
Yo Ehara
Evaluation of Unsupervised Automatic Readability Assessors Using Rank Correlations.
Eval4NLP
(2021)
Enzo Terreau
,
Antoine Gourru
,
Julien Velcin
Writing Style Author Embedding Evaluation.
Eval4NLP
(2021)
Melda Eksi
,
Erik Gelbing
,
Jonathan Stieber
,
Chi Viet Vu
Explaining Errors in Machine Translation with Absolute Gradient Ensembles.
Eval4NLP
(2021)
Lucie Gianola
,
Hicham El Boukkouri
,
Cyril Grouin
,
Thomas Lavergne
,
Patrick Paroubek
,
Pierre Zweigenbaum
Differential Evaluation: a Qualitative Analysis of Natural Language Processing System Behavior Based Upon Data Resistance to Processing.
Eval4NLP
(2021)
Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems, Eval4NLP 2021, Punta Cana, Dominican Republic, November 10, 2021
Eval4NLP
(2021)
Marcos V. Treviso
,
Nuno Miguel Guerreiro
,
Ricardo Rei
,
André F. T. Martins
IST-Unbabel 2021 Submission for the Explainable Quality Estimation Shared Task.
Eval4NLP
(2021)
Oleg V. Vasilyev
,
John Bohannon
ESTIME: Estimation of Summary-to-Text Inconsistency by Mismatched Embeddings.
Eval4NLP
(2021)
Alexey Tikhonov
,
Igor Samenko
,
Ivan P. Yamshchikov
StoryDB: Broad Multi-language Narrative Dataset.
Eval4NLP
(2021)