Dataset Viewer
Auto-converted to Parquet Duplicate
evaluation_image
stringlengths
54
73
instance_id
stringlengths
3
22
package_name
stringlengths
3
46
start_instruction
stringlengths
1.17k
32.7k
test_cases_num
int64
4
2.18k
verify_cmd
listlengths
1
4
verify_files
listlengths
1
12
ghcr.io/multimodal-art-projection/nl2repobench/schema:1.0
schema
schema
## Schema Project Introduction and Goals Schema is a lightweight library **for Python data structure validation**. It can parse and validate various data formats (supporting Python native data structures such as dictionaries, lists, tuples, and sets) and ensure that the data conforms to a predefined schema. This tool ...
118
[ "pip install -e .", "pytest --continue-on-collection-errors test_schema.py" ]
[ "tests" ]
ghcr.io/multimodal-art-projection/nl2repobench/funcy:1.0
funcy
funcy
# Introduction and Goals of the Funcy Project Funcy is a utility library for functional programming in Python, providing Python developers with rich functional programming abstractions and practical tools. It supports various scenarios such as collection operations, function composition, flow control, and debugging to...
203
[ "pip install -e .", "pytest --continue-on-collection-errors tests" ]
[ "tests" ]
ghcr.io/multimodal-art-projection/nl2repobench/cherry:1.0
cherry
Cherry
## Introduction and Goals of the Cherry Project Cherry is a lightweight Python library **for text classification** that enables users without machine learning knowledge to quickly train a high-accuracy model within 5 minutes. This tool aims to significantly lower the threshold for text classification tasks, allowing d...
34
[ "pip install -e .", "pytest --continue-on-collection-errors tests" ]
[ "tests" ]
ghcr.io/multimodal-art-projection/nl2repobench/decouple:1.0
decouple
decouple
## Introduction and Goals of the Python-Decouple Project Python-Decouple is a Python library **oriented towards configuration management separation**. It can achieve strict separation between code and configuration, support reading configuration parameters from environment variables, .env files, and .ini files, and pr...
67
[ "pip install -e .", "pytest --continue-on-collection-errors tests" ]
[ "tests" ]
ghcr.io/multimodal-art-projection/nl2repobench/jinja:1.0
jinja
jinja
## Introduction and Goals of the Jinja2 Project Jinja2 is a **fast and expressive template engine** written in pure Python. It offers a non-XML syntax, supports inline expressions, and provides an optional sandbox environment. This engine is widely used in scenarios such as web development, configuration generation, a...
911
[ "echo Hello >> README.rst", "echo Hello >> README.md", "pip install -e .", "pytest --continue-on-collection-errors tests" ]
[ "tests" ]
ghcr.io/multimodal-art-projection/nl2repobench/freezegun:1.0
freezegun
freezegun
"## Introduction and Goals of the freezegun Project\n\nfreezegun is a time-freezing library for Pyth(...TRUNCATED)
133
[ "pip install -e .", "pytest --continue-on-collection-errors tests" ]
[ "tests" ]
ghcr.io/multimodal-art-projection/nl2repobench/cerberus:1.0
cerberus
cerberus
"## Project Introduction and Goals\n\n**Cerberus** is a lightweight and extensible Python data valid(...TRUNCATED)
249
["pip install -e .","pytest --continue-on-collection-errors cerberus/tests cerberus/benchmarks/test_(...TRUNCATED)
[ "tests" ]
ghcr.io/multimodal-art-projection/nl2repobench/mechanicalsoup:1.0
mechanicalsoup
mechanicalsoup
"# Introduction to the MechanicalSoup_main Project\n\n## 1. Project Overview and Objectives\n\nMecha(...TRUNCATED)
140
[ "pip install -e .", "pytest --continue-on-collection-errors tests" ]
[ "tests" ]
ghcr.io/multimodal-art-projection/nl2repobench/ipytest:1.0
ipytest
ipytest
"## Introduction and Goals of the ipytest Project\n\nipytest is a Python library for test execution (...TRUNCATED)
81
[ "pip install -e .", "pytest --continue-on-collection-errors tests" ]
[ "tests" ]
ghcr.io/multimodal-art-projection/nl2repobench/tinydb:1.0
tinydb
tinydb
"## Introduction and Goals of the TinyDB Project\n\nTinyDB is a **lightweight document-oriented data(...TRUNCATED)
203
[ "pip install -e .", "pytest --continue-on-collection-errors tests" ]
[ "tests" ]
End of preview. Expand in Data Studio

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

AweAgent-Meta-NL2Repo

This dataset provides the metadata used by AweAgent to run the NL2RepoBench evaluation.

If you are looking for the underlying benchmark itself (task design, repositories, test suites), please refer to the original project: multimodal-art-projection/NL2RepoBench.

Purpose

The AweAgent repository evaluates end-to-end repo-level code generation: given a natural-language project specification, the agent must produce a working Python package that passes the project's tests. To run that evaluation reproducibly, the harness needs a compact, machine-readable manifest of every instance — that is exactly what this dataset provides.

Concretely, each row tells AweAgent:

  • which prebuilt evaluation Docker image to launch,
  • the NL prompt that defines the target repository,
  • the verification command and the files that must be produced,
  • the number of test cases the generated repo will be scored against.

Files

  • nl2repo_aweagent.jsonl — one JSON object per NL2RepoBench instance (104 instances).

Schema

Field Type Description
instance_id str Unique identifier for the instance (typically the target package name).
package_name str The Python package the agent is expected to generate.
evaluation_image str Docker image (hosted under ghcr.io/multimodal-art-projection/nl2repobench) used to evaluate the generated repository.
start_instruction str The natural-language task description handed to the agent as the starting prompt.
verify_files list[str] Files that must be produced by the agent and are checked during verification.
verify_cmd str The command executed inside the evaluation image to verify the generated repository.
test_cases_num int Number of test cases used to score the instance.

Usage

from datasets import load_dataset

ds = load_dataset("AweAI-Team/AweAgent-Meta-NL2Repo", split="train")
print(ds[0])

This manifest is consumed by the evaluation pipeline in AweAgent; see that repository for the full runner, scoring logic, and reproduction instructions.

Acknowledgements

This dataset is built on top of, and would not exist without, the excellent NL2RepoBench project by the Multimodal Art Projection team. All benchmark instances, evaluation images, and test cases originate from their work; this dataset only repackages the per-instance metadata in the form AweAgent's evaluation harness expects. Huge thanks to the NL2RepoBench authors for releasing such a high-quality repository-level code-generation benchmark.

License

Released under CC BY 4.0. When using this dataset, please also cite and credit the upstream NL2RepoBench project.

Downloads last month
33

Collection including AweAI-Team/AweAgent-Meta-NL2Repo