Nature, Published online: 28 January 2026; doi:10.1038/s41586-025-10041-x

Emu3 enables large-scale text, image and video learning based solely on next-token prediction, matching the generation and perception performance of task-specific methods, with implications for the development of scalable and unified multimodal intelligence systems.


From Nature via this RSS feed