timaeus/rl-lm-pythia160m-sentiment-pos-beta0-grpo-nostd-gs4-tp1-tk0-pt80000-steerDotIncL3c128s1-seed21 Updated 1 day ago • 1
CM-EVS: Sparse Panoramic RGB-D-Pose Data for Complete Scene Coverage Paper • 2605.15597 • Published May 15 • 11
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information Paper • 2605.11609 • Published May 12 • 196
SeePhys Pro: Diagnosing Modality Transfer and Blind-Training Effects in Multimodal RLVR for Physics Reasoning Paper • 2605.09266 • Published May 10 • 14
From Context to Skills: Can Language Models Learn from Context Skillfully? Paper • 2604.27660 • Published May 3 • 171