The Extreme Inefficiency of RL for Frontier Models

www.tobyord.com

The Extreme Inefficiency of RL for Frontier Models

www.tobyord.com

dynomight@lemmy.worldM to dynomight internet forum@lemmy.worldEnglish · 2 个月前

The Extreme Inefficiency of RL for Frontier Models — Toby Ord

www.tobyord.com

The new scaling paradigm for AI reduces the amount of information a model could learn per hour of training by a factor of 1,000 to 1,000,000. I explore what this means and its implications for scaling.

You must log in or register to comment.

Chat