
LEPISZCZE
A 14-task NLP benchmark for Polish with unified evaluation infrastructure and public leaderboard.
LEPISZCZE is an NLP benchmark for Polish, published at NeurIPS 2022 (Datasets & Benchmarks track). The benchmark unifies evaluation across 14 distinct NLP tasks spanning sentiment analysis, named entity recognition, part-of-speech tagging, sequence labeling, and question answering.
What it is
The benchmark integrates publicly available Polish datasets into a single evaluation framework with a public leaderboard. It covers diverse task types:
- Classification tasks (sentiment analysis, abusive clause detection)
- Sequence labeling (NER, POS tagging, political advertising detection)
- Pair classification (textual entailment, question answering, text summarization)
- Extractive question answering (SQuAD 2.0 style on Polish Wikipedia and NKJP)
The repository provides reproducible experiment configurations, hyperparameter search setups, and baseline results logged on Weights & Biases.
My role
Co-authored the benchmark design and implementation, contributing to the curation and standardization of task definitions and dataset integration. The work involved coordinating across 11 collaborators to establish unified evaluation protocols and baseline results across all 14 tasks.