Qwen2.5-0.5B-grpo-summarization-quality-meteor-rouge

The model was first trained with a length-only reward to learn output length control, then fine-tuned further with quality-only rewards (no explicit length penalty). This checkpoint used METEOR + ROUGE as the quality signal.

G-Eval Scores (each 0–1; Composite max 4.0)

Faithfulness Coverage Conciseness Clarity Composite Pass Rate
0.853 0.489 0.692 0.762 2.796 38.3%

Evaluated on 200 examples from mlabonne/smoltldr test split · judge: gpt-5-mini-2025-08-07 via DeepEval G-Eval (5 rounds averaged, each metric 0–1)

Eval Rollouts

Full per-example outputs, G-Eval scores, significance tests, and summary tables are in the dataset repo: YuvrajSingh9886/reddit-posts-summarization-grpo

All Length-Penalty Fine-tuned Runs

Run Faithfulness Coverage Conciseness Clarity Composite Pass Rate
grpo-summarization-quality-bleu-rouge 0.865 0.329 0.839 0.784 2.817 18.2%
grpo-summarization-quality-meteor-rouge 0.853 0.489 0.692 0.762 2.796 38.3%
grpo-summarization-quality-rouge 0.818 0.338 0.841 0.779 2.777 19.6%
grpo-summarization-quality-meteor-bleu 0.933 0.716 0.322 0.763 2.734 26.1%
grpo-summarization-quality-meteor 0.883 0.619 0.444 0.751 2.697 30.5%
grpo-summarization-quality-bleu 0.722 0.439 0.575 0.678 2.414 32.1%

Length-Penalty Included Runs (alternative strategy)

Run Faithfulness Coverage Conciseness Clarity Composite Pass Rate
grpo-summarization-length-quality-meteor-rouge 0.832 0.511 0.659 0.767 2.769 44.3%
grpo-summarization-length-quality-bleu-rouge 0.810 0.502 0.650 0.770 2.732 39.1%
grpo-summarization-length-quality-meteor-bleu 0.792 0.468 0.648 0.756 2.664 38.3%
grpo-summarization-length-quality-rouge 0.725 0.415 0.637 0.778 2.555 32.4%
grpo-summarization-length-quality-meteor 0.721 0.427 0.625 0.711 2.484
grpo-summarization-length-only 0.678 0.407 0.592 0.739 2.416 30.7%
grpo-summarization-length-quality-bleu 0.680 0.399 0.577 0.744 2.400 26.9%

Usage (MLX)

from mlx_lm import load, generate

model, tokenizer = load("YuvrajSingh9886/Qwen2.5-0.5B-grpo-summarization-quality-meteor-rouge")
messages = [
    {"role": "system", "content": "You are an assistant who is an expert at summarization task. The user gives you a post and you are required to summarize it, keeping the key points and main ideas intact, in EXACTLY 50 words"},
    {"role": "user",   "content": "<paste Reddit post here>"},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
print(generate(model, tokenizer, prompt=prompt, max_tokens=128, verbose=False))

Training Details

Setting Value
Base model Qwen/Qwen2.5-0.5B-Instruct
Algorithm GRPO (via smolcluster)
Dataset mlabonne/smoltldr (train split, Reddit summarization)
Stage 1 reward Length penalty (deviation from 50-token target)
Stage 2 reward METEOR + ROUGE
Hardware Apple Silicon Mac mini cluster
Framework MLX
Weights format MLX safetensors (bf16)
Eval examples 200 (test split)
Judge gpt-5-mini-2025-08-07 via DeepEval G-Eval
Downloads last month
534
Safetensors
Model size
0.5B params
Tensor type
BF16
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for YuvrajSingh9886/Qwen2.5-0.5B-grpo-summarization-quality-meteor-rouge

Finetuned
(815)
this model

Dataset used to train YuvrajSingh9886/Qwen2.5-0.5B-grpo-summarization-quality-meteor-rouge