Login / Signup
Peter Rupnik
Publication Activity (10 Years)
Years Active: 2021-2024
Publications (10 Years): 13
Top Topics
Training Dataset
Top Venues
CoRR
LREC/COLING
VarDial@EACL
EAMT
</>
Publications
</>
Nikola Ljubesic
,
Vít Suchomel
,
Peter Rupnik
,
Taja Kuzman
,
Rik van Noord
Language Models on a Diet: Cost-Efficient Development of Encoders for Closely-Related Languages via Additional Pretraining.
CoRR
(2024)
Rik van Noord
,
Taja Kuzman
,
Peter Rupnik
,
Nikola Ljubesic
,
Miquel Esplà-Gomis
,
Gema Ramírez-Sánchez
,
Antonio Toral
Do Language Models Care about Text Quality? Evaluating Web-Crawled Corpora across 11 Languages.
LREC/COLING
(2024)
Michal Mochtak
,
Peter Rupnik
,
Nikola Ljubesic
The ParlaSent Multilingual Training Dataset for Sentiment Identification in Parliamentary Proceedings.
LREC/COLING
(2024)
Rik van Noord
,
Taja Kuzman
,
Peter Rupnik
,
Nikola Ljubesic
,
Miquel Esplà-Gomis
,
Gema Ramírez-Sánchez
,
Antonio Toral
Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages.
CoRR
(2024)
Peter Rupnik
,
Taja Kuzman
,
Nikola Ljubesic
BENCHić-lang: A Benchmark for Discriminating between Bosnian, Croatian, Montenegrin and Serbian.
VarDial@EACL
(2023)
Michal Mochtak
,
Peter Rupnik
,
Nikola Ljubesic
The ParlaSent multilingual training dataset for sentiment identification in parliamentary proceedings.
CoRR
(2023)
Taja Kuzman
,
Peter Rupnik
,
Nikola Ljubesic
Get to Know Your Parallel Data: Performing English Variety and Genre Classification over MaCoCu Corpora.
VarDial@EACL
(2023)
Marta Bañón
,
Malina Chichirau
,
Miquel Esplà-Gomis
,
Mikel L. Forcada
,
Aarón Galiano Jiménez
,
Taja Kuzman
,
Nikola Ljubesic
,
Rik van Noord
,
Leopoldo Pla Sempere
,
Gema Ramírez-Sánchez
,
Peter Rupnik
,
Vit Suchomel
,
Antonio Toral
,
Jaume Zaragoza-Bernabeu
MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages.
EAMT
(2023)
Michal Mochtak
,
Peter Rupnik
,
Nikola Ljubesic
The ParlaSent-BCS dataset of sentiment-annotated parliamentary debates from Bosnia-Herzegovina, Croatia, and Serbia.
CoRR
(2022)
Taja Kuzman
,
Peter Rupnik
,
Nikola Ljubesic
The GINCO Training Dataset for Web Genre Identification of Documents Out in the Wild.
CoRR
(2022)
Marta Bañón
,
Miquel Esplà-Gomis
,
Mikel L. Forcada
,
Cristian García-Romero
,
Taja Kuzman
,
Nikola Ljubesic
,
Rik van Noord
,
Leopoldo Pla Sempere
,
Gema Ramírez-Sánchez
,
Peter Rupnik
,
Vít Suchomel
,
Antonio Toral
,
Tobias van der Werff
,
Jaume Zaragoza
MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages.
EAMT
(2022)
Taja Kuzman
,
Peter Rupnik
,
Nikola Ljubesic
The GINCO Training Dataset for Web Genre Identification of Documents Out in the Wild.
LREC
(2022)
Martin Znidarsic
,
Aljaz Osojnik
,
Peter Rupnik
,
Bernard Zenko
Improving Effectiveness of a Coaching System Through Preference Learning.
PETRA
(2021)