In this work, we discuss building performant Multimodal Large Language Models (MLLMs). Through careful and comprehensive ablations of the image encoder, the vision language connector, and various pre-training data choices, we identified several crucial design lessons: …
In the realm of artificial intelligence and machine learning, the quest for creating more immersive and interactive experiences has led to significant advancements. The paper introduces "Genie," a groundbreaking generative model capable of creating interactive environments from unsupervised learning of internet videos. With its 11 billion parameters, Genie represents a new frontier in AI, blending the spatiotemporal dynamics of video with the interactivity of virtual worlds. …
The paper "A Decoder-Only Foundation Model for Time-Series Forecasting" introduces a groundbreaking approach in the field of time-series forecasting, leveraging the power of decoder-only models, commonly used in natural language processing, to achieve remarkable zero-shot forecasting capabilities across a variety of domains. …