Merlin Introduction. Introducing Merlin, a groundbreaking model capable of generating natural language responses that are intricately linked with object trajectories. Merlin excels in predicting and reasoning about future events based on initial observations, showcasing an unprecedented capability in future prediction and reasoning.
Future Reasoning Evaluation. Addressing the absence of standardized benchmarks for future reasoning, we have developed the Future Reasoning Benchmark, an innovative measure derived from the existing MMBench. We also assess Merlin's performance on mainstream tracking benchmarks to evaluate its proficiency in aligning multiple images and identities. Notably, Merlin is the first model of its kind to perform tracking tasks.
Merlin-Chat Dataset Creation. To facilitate Foresight Pre-Training (FPT) and Foresight Instruction-Tuning (FIT), we have created the Merlin-Chat dataset. This dataset, featuring feature reasoning conversations, is developed using GPT-4V and covers three scenarios: sports, lifestyle, and transportation. It comprises 30,000 unique dialogue samples with predicted trajectories for future reasoning. Additionally, we introduce "FPT-data," a tailor-made dataset specifically designed for the FPT task, repurposed from existing open-source datasets.
Creating Merlin
*Overall pipeline of Merlin
Gallery
*conversations generated with instructions provided by our users