Edit Datasets filters

Datasets

4,916

Full-text search

Active filters: benchmark

umutcaned/turkreason

Viewer • Updated Apr 8 • 5.11k • 129 • 11

tencent/workbuddy-bench

Updated 4 days ago • 250 • 7

IntelligenceLab/Long-Horizon-Terminal-Bench

Viewer • Updated 5 days ago • 46 • 2.59k • 118

BananaMind/BananaMind-Base-Bench-1.1

Viewer • Updated 1 day ago • 350 • 49 • 6

llamaindex/ParseBench

Benchmark • Updated Apr 19 • 169k • 11k • 110

datacurve/deep-swe

Benchmark • Updated Jun 2 • 113 • 644 • 25

ZuoHaotong/FilmEval

Viewer • Updated 3 days ago • 90 • 377 • 5

google/deepsearchqa

Viewer • Updated Dec 17, 2025 • 900 • 35.2k • 126

actava/chi-bench

Benchmark • Updated Jun 2 • 101 • 1.62k • 59

Qwen/AgentWorldBench

Viewer • Updated 21 days ago • 2.17k • 2.74k • 85

openai/genebench-pro-public-package

Viewer • Updated 25 days ago • 10 • 169 • 3

ajh-oai/genebench-pro-public-package

Viewer • Updated 25 days ago • 10 • 1.58k • 12

t-tech/SynthComp

Viewer • Updated 9 days ago • 790 • 169 • 13

t-tech/TRuST

Viewer • Updated 9 days ago • 324 • 184 • 16

activevisionai/ActiveVision

Viewer • Updated 6 days ago • 85 • 899 • 3

jumplander/JL-Agentic-Behavior-10K

Preview • Updated 4 days ago • 409 • 3

Multilingual-Multimodal-NLP/FinanceComplexQA

Updated 1 day ago • 53 • 3

BestWishYsh/OpenS2V-5M

Updated Jan 6 • 23.3k • 29

Qwen/Qwen-Image-Bench

Viewer • Updated May 28 • 1k • 9.38k • 43

makora-ai/triton-gpu-latency

Viewer • Updated Jun 12 • 601k • 274 • 17

Rapidata/svg-benchmark

Viewer • Updated 26 days ago • 189k • 2.25k • 30

ByteDance-Seed/EdgeBench

Viewer • Updated 16 days ago • 51 • 10.5k • 79

sy-xie/robovista

Viewer • Updated 19 days ago • 474 • 206 • 4

besimple-ai/vocal-affect-bench

Viewer • Updated 5 days ago • 280 • 731 • 4

HelpMum-Personal/MamaBench

Viewer • Updated 9 days ago • 434 • 24 • 2

kaus4004/robust-humanoid-ppo-results

Viewer • Updated 3 days ago • 3 • 90 • 2

michaljunczyk/pl-asr-bigos

Updated Jan 8, 2024 • 56 • 5

ikala/tmmluplus

Viewer • Updated Sep 4, 2025 • 22.7k • 1.24k • 133

tranthaihoa/vifactcheck

Viewer • Updated Mar 27 • 7.23k • 414 • 8

llm-lab/TuringQ

Viewer • Updated Oct 3, 2024 • 4.01k • 20 • 4