mlabonne/smoltldr
Viewer • Updated • 2.4k • 512 • 12
How to use YuvrajSingh9886/Qwen2.5-0.5B-grpo-summarization-quality-meteor-rouge with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Qwen2.5-0.5B-grpo-summarization-quality-meteor-rouge YuvrajSingh9886/Qwen2.5-0.5B-grpo-summarization-quality-meteor-rouge
The model was first trained with a length-only reward to learn output length control, then fine-tuned further with quality-only rewards (no explicit length penalty). This checkpoint used METEOR + ROUGE as the quality signal.
| Faithfulness | Coverage | Conciseness | Clarity | Composite | Pass Rate |
|---|---|---|---|---|---|
| 0.853 | 0.489 | 0.692 | 0.762 | 2.796 | 38.3% |
Evaluated on 200 examples from mlabonne/smoltldr test split · judge: gpt-5-mini-2025-08-07 via DeepEval G-Eval (5 rounds averaged, each metric 0–1)
Full per-example outputs, G-Eval scores, significance tests, and summary tables are in the dataset repo:
YuvrajSingh9886/reddit-posts-summarization-grpo
| Run | Faithfulness | Coverage | Conciseness | Clarity | Composite | Pass Rate |
|---|---|---|---|---|---|---|
grpo-summarization-quality-bleu-rouge ⭐ |
0.865 | 0.329 | 0.839 | 0.784 | 2.817 | 18.2% |
grpo-summarization-quality-meteor-rouge |
0.853 | 0.489 | 0.692 | 0.762 | 2.796 | 38.3% |
grpo-summarization-quality-rouge |
0.818 | 0.338 | 0.841 | 0.779 | 2.777 | 19.6% |
grpo-summarization-quality-meteor-bleu |
0.933 | 0.716 | 0.322 | 0.763 | 2.734 | 26.1% |
grpo-summarization-quality-meteor |
0.883 | 0.619 | 0.444 | 0.751 | 2.697 | 30.5% |
grpo-summarization-quality-bleu |
0.722 | 0.439 | 0.575 | 0.678 | 2.414 | 32.1% |
| Run | Faithfulness | Coverage | Conciseness | Clarity | Composite | Pass Rate |
|---|---|---|---|---|---|---|
grpo-summarization-length-quality-meteor-rouge ⭐ |
0.832 | 0.511 | 0.659 | 0.767 | 2.769 | 44.3% |
grpo-summarization-length-quality-bleu-rouge |
0.810 | 0.502 | 0.650 | 0.770 | 2.732 | 39.1% |
grpo-summarization-length-quality-meteor-bleu |
0.792 | 0.468 | 0.648 | 0.756 | 2.664 | 38.3% |
grpo-summarization-length-quality-rouge |
0.725 | 0.415 | 0.637 | 0.778 | 2.555 | 32.4% |
grpo-summarization-length-quality-meteor |
0.721 | 0.427 | 0.625 | 0.711 | 2.484 | — |
grpo-summarization-length-only |
0.678 | 0.407 | 0.592 | 0.739 | 2.416 | 30.7% |
grpo-summarization-length-quality-bleu |
0.680 | 0.399 | 0.577 | 0.744 | 2.400 | 26.9% |
from mlx_lm import load, generate
model, tokenizer = load("YuvrajSingh9886/Qwen2.5-0.5B-grpo-summarization-quality-meteor-rouge")
messages = [
{"role": "system", "content": "You are an assistant who is an expert at summarization task. The user gives you a post and you are required to summarize it, keeping the key points and main ideas intact, in EXACTLY 50 words"},
{"role": "user", "content": "<paste Reddit post here>"},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
print(generate(model, tokenizer, prompt=prompt, max_tokens=128, verbose=False))
| Setting | Value |
|---|---|
| Base model | Qwen/Qwen2.5-0.5B-Instruct |
| Algorithm | GRPO (via smolcluster) |
| Dataset | mlabonne/smoltldr (train split, Reddit summarization) |
| Stage 1 reward | Length penalty (deviation from 50-token target) |
| Stage 2 reward | METEOR + ROUGE |
| Hardware | Apple Silicon Mac mini cluster |
| Framework | MLX |
| Weights format | MLX safetensors (bf16) |
| Eval examples | 200 (test split) |
| Judge | gpt-5-mini-2025-08-07 via DeepEval G-Eval |
Quantized