Datasets:

AweAI-Team
/

AweAgent-Meta-NL2Repo

Modalities:

Text

Formats:

Size:

Libraries:

Dataset card Data Studio Files Files and versions

xet

Community

Dataset Viewer

Auto-converted to Parquet Duplicate

Split (1)

train · 104 rows

evaluation_image stringlengths 54 73	instance_id stringlengths 3 22	package_name stringlengths 3 46	start_instruction stringlengths 1.17k 32.7k	test_cases_num int64 4 2.18k	verify_cmd listlengths 1 4	verify_files listlengths 1 12
ghcr.io/multimodal-art-projection/nl2repobench/schema:1.0	schema	schema	## Schema Project Introduction and Goals Schema is a lightweight library for Python data structure validation. It can parse and validate various data formats (supporting Python native data structures such as dictionaries, lists, tuples, and sets) and ensure that the data conforms to a predefined schema. This tool ...	118	[ "pip install -e .", "pytest --continue-on-collection-errors test_schema.py" ]	[ "tests" ]
ghcr.io/multimodal-art-projection/nl2repobench/funcy:1.0	funcy	funcy	# Introduction and Goals of the Funcy Project Funcy is a utility library for functional programming in Python, providing Python developers with rich functional programming abstractions and practical tools. It supports various scenarios such as collection operations, function composition, flow control, and debugging to...	203	[ "pip install -e .", "pytest --continue-on-collection-errors tests" ]	[ "tests" ]
ghcr.io/multimodal-art-projection/nl2repobench/cherry:1.0	cherry	Cherry	## Introduction and Goals of the Cherry Project Cherry is a lightweight Python library for text classification that enables users without machine learning knowledge to quickly train a high-accuracy model within 5 minutes. This tool aims to significantly lower the threshold for text classification tasks, allowing d...	34	[ "pip install -e .", "pytest --continue-on-collection-errors tests" ]	[ "tests" ]
ghcr.io/multimodal-art-projection/nl2repobench/decouple:1.0	decouple	decouple	## Introduction and Goals of the Python-Decouple Project Python-Decouple is a Python library oriented towards configuration management separation. It can achieve strict separation between code and configuration, support reading configuration parameters from environment variables, .env files, and .ini files, and pr...	67	[ "pip install -e .", "pytest --continue-on-collection-errors tests" ]	[ "tests" ]
ghcr.io/multimodal-art-projection/nl2repobench/jinja:1.0	jinja	jinja	## Introduction and Goals of the Jinja2 Project Jinja2 is a fast and expressive template engine written in pure Python. It offers a non-XML syntax, supports inline expressions, and provides an optional sandbox environment. This engine is widely used in scenarios such as web development, configuration generation, a...	911	[ "echo Hello >> README.rst", "echo Hello >> README.md", "pip install -e .", "pytest --continue-on-collection-errors tests" ]	[ "tests" ]
ghcr.io/multimodal-art-projection/nl2repobench/freezegun:1.0	freezegun	freezegun	"## Introduction and Goals of the freezegun Project\n\nfreezegun is a time-freezing library for Pyth(...TRUNCATED)	133	[ "pip install -e .", "pytest --continue-on-collection-errors tests" ]	[ "tests" ]
ghcr.io/multimodal-art-projection/nl2repobench/cerberus:1.0	cerberus	cerberus	"## Project Introduction and Goals\n\nCerberus is a lightweight and extensible Python data valid(...TRUNCATED)	249	["pip install -e .","pytest --continue-on-collection-errors cerberus/tests cerberus/benchmarks/test_(...TRUNCATED)	[ "tests" ]
ghcr.io/multimodal-art-projection/nl2repobench/mechanicalsoup:1.0	mechanicalsoup	mechanicalsoup	"# Introduction to the MechanicalSoup_main Project\n\n## 1. Project Overview and Objectives\n\nMecha(...TRUNCATED)	140	[ "pip install -e .", "pytest --continue-on-collection-errors tests" ]	[ "tests" ]
ghcr.io/multimodal-art-projection/nl2repobench/ipytest:1.0	ipytest	ipytest	"## Introduction and Goals of the ipytest Project\n\nipytest is a Python library for test execution (...TRUNCATED)	81	[ "pip install -e .", "pytest --continue-on-collection-errors tests" ]	[ "tests" ]
ghcr.io/multimodal-art-projection/nl2repobench/tinydb:1.0	tinydb	tinydb	"## Introduction and Goals of the TinyDB Project\n\nTinyDB is a **lightweight document-oriented data(...TRUNCATED)	203	[ "pip install -e .", "pytest --continue-on-collection-errors tests" ]	[ "tests" ]

End of preview. Expand in Data Studio

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

AweAgent-Meta-NL2Repo

This dataset provides the metadata used by AweAgent to run the NL2RepoBench evaluation.

If you are looking for the underlying benchmark itself (task design, repositories, test suites), please refer to the original project: multimodal-art-projection/NL2RepoBench.

Purpose

The AweAgent repository evaluates end-to-end repo-level code generation: given a natural-language project specification, the agent must produce a working Python package that passes the project's tests. To run that evaluation reproducibly, the harness needs a compact, machine-readable manifest of every instance — that is exactly what this dataset provides.

Concretely, each row tells AweAgent:

which prebuilt evaluation Docker image to launch,
the NL prompt that defines the target repository,
the verification command and the files that must be produced,
the number of test cases the generated repo will be scored against.

Files

nl2repo_aweagent.jsonl — one JSON object per NL2RepoBench instance (104 instances).

Schema

Field	Type	Description
`instance_id`	`str`	Unique identifier for the instance (typically the target package name).
`package_name`	`str`	The Python package the agent is expected to generate.
`evaluation_image`	`str`	Docker image (hosted under `ghcr.io/multimodal-art-projection/nl2repobench`) used to evaluate the generated repository.
`start_instruction`	`str`	The natural-language task description handed to the agent as the starting prompt.
`verify_files`	`list[str]`	Files that must be produced by the agent and are checked during verification.
`verify_cmd`	`str`	The command executed inside the evaluation image to verify the generated repository.
`test_cases_num`	`int`	Number of test cases used to score the instance.

Usage

from datasets import load_dataset

ds = load_dataset("AweAI-Team/AweAgent-Meta-NL2Repo", split="train")
print(ds[0])

This manifest is consumed by the evaluation pipeline in AweAgent; see that repository for the full runner, scoring logic, and reproduction instructions.

Acknowledgements

This dataset is built on top of, and would not exist without, the excellent NL2RepoBench project by the Multimodal Art Projection team. All benchmark instances, evaluation images, and test cases originate from their work; this dataset only repackages the per-instance metadata in the form AweAgent's evaluation harness expects. Huge thanks to the NL2RepoBench authors for releasing such a high-quality repository-level code-generation benchmark.

License

Released under CC BY 4.0. When using this dataset, please also cite and credit the upstream NL2RepoBench project.

Downloads last month: 33

Collection including AweAI-Team/AweAgent-Meta-NL2Repo

AweAgent Meta-Data

Collection

Meta-data for AweAgent: https://github.com/AweAI-Team/AweAgent • 4 items • Updated 6 days ago