BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//CMSA - ECPv6.15.20//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:CMSA
X-ORIGINAL-URL:https://cmsa.fas.harvard.edu
X-WR-CALDESC:Events for CMSA
REFRESH-INTERVAL;VALUE=DURATION:PT1H
X-Robots-Tag:noindex
X-PUBLISHED-TTL:PT1H
BEGIN:VTIMEZONE
TZID:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20240310T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20241103T060000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20250309T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20251102T060000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20260308T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20261101T060000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20250516T120000
DTEND;TZID=America/New_York:20250516T130000
DTSTAMP:20260520T115918
CREATED:20250218T161047Z
LAST-MODIFIED:20250513T152517Z
UID:10003714-1747396800-1747400400@cmsa.fas.harvard.edu
SUMMARY:Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
DESCRIPTION:Member Seminar \nSpeaker: Samy Jelassi\, CMSA \nTitle: Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining \nAbstract: Reinforcement Learning has become a crucial step in training state-of-the-art language models such as DeepSeek-R1 for solving mathematical problems. In this talk\, I will first review the mechanisms of Reinforcement Learning fine-tuning. Then\, I will present a systematic end-to-end study of RL fine-tuning for mathematical reasoning\, training models entirely from scratch on different mixtures of fully open datasets and fine-tuning them with RL. Doing so allows us to investigate the effects of the pretraining data mixture on the behavior of RL\, and its interaction with the model size and choices of the algorithm hyperparameters. Our study reveals that RL algorithms consistently converge towards a dominant output distribution\, amplifying patterns in the pretraining data. We also find that models of different scales trained on the same data mixture will converge to distinct output distributions\, suggesting that there are scale-dependent biases in model generalization. \nThe second part of the talk is based on a joint work with Rosie Zhao\, Alex Meterez\, Cengiz Pehlevan\, Sham Kakade and Eran Malach: https://arxiv.org/abs/2504.07912 \n 
URL:https://cmsa.fas.harvard.edu/event/member-seminar-51625/
LOCATION:CMSA Room G10\, CMSA\, 20 Garden Street\, Cambridge\, MA\, 02138\, United States
CATEGORIES:Member Seminar
ATTACH;FMTTYPE=image/png:https://cmsa.fas.harvard.edu/media/CMSA-Member-Seminar-5.16.25.png
END:VEVENT
END:VCALENDAR