๐ค
GRPO CodeReviewEnv
Reinforcement Learning ยท Bug-Fix Agent ยท Auto Difficulty Escalation
Qwen2.5-Coder-32B
HF Router
GRPO Training
Exec + LLM Judge Rewards
๐ฏ
Current Level
HARD
๐
Total Episodes
10
๐ฅ
Win Streak
0
โก
Last Reward
0.830
๐ Training Stats
Extreme | 3 | 0.938 | 0.83 | 0.69 | โ
Mastered |
๐ก Live Episode Feed (last 20)
10 | Medium | 0.830 | โโโโโโโโโโ |
10 | Hard | 0.830 | โโโโโโโโโโ |
9 | Medium | 1.000 | โโโโโโโโโโ |
8 | Medium | 1.000 | โโโโโโโโโโ |
7 | Medium | 1.000 | โโโโโโโโโโ |
6 | Medium | 0.690 | โโโโโโโโโโ |
5 | Medium | 1.000 | โโโโโโโโโโ |
4 | Medium | 1.000 | โโโโโโโโโโ |
3 | Easy | 1.000 | โโโโโโโโโโ |
2 | Easy | 1.000 | โโโโโโโโโโ |
1 | Easy | 1.000 | โโโโโโโโโโ |
5 100
Auto-refreshes every 3s | Escalate threshold: 0.8 | Window: 5