Diversity-Aware Policy Optimization for Large Language Model Reasoning

Publication
In The Thirty-ninth Annual Conference on Neural Information Processing Systems (Spotlight)