
ether0: A 24B LLM Trained with Reinforcement Learning RL for Advanced Chemical Reasoning Tasks
LLMs primarily enhance accuracy through scaling pre-training data and computing resources. However, the attention has shifted towards alternate scaling due to finite data availability. This includes test-time training and inference compute scaling. Reasoning models enhance […]