Researchers introduce Video-MME-v2, a significant benchmark designed to push the boundaries of comprehensive video understanding models.
AI News — Thursday, April 9, 2026
A new framework, Claw-Eval, is proposed to provide a more trustworthy and robust evaluation methodology for autonomous AI agents.
The head of AWS clarifies the strategic rationale behind Amazon's substantial investments in both Anthropic and OpenAI, addressing potential conflicts of interest.
MegaTrain presents a breakthrough in training massive language models with over 100 billion parameters in full precision on just a single GPU.
A new multimodal AI system demonstrates the ability to design proteins and translate chemical information into DNA sequences.
A new research paper explores methods for AI agents to learn and retrieve information effectively from their past trajectories, improving decision-making.
AIMock is introduced as a versatile mock server designed to streamline development and testing across an entire AI application stack.
Poke is a new AI agent that simplifies the creation of automations, allowing users to set up complex tasks with simple text commands.
Tubi becomes the first streaming service to integrate directly into ChatGPT with a native application, allowing users to access content recommendations and playback within the AI interface.
Paper Circle is introduced as an open-source framework that leverages multiple AI agents to assist in the discovery and analysis of research papers.
A new evaluation metric, ACES, is presented to assess the consistency and reliability of code generation models by testing their tests.
A new benchmark called GBQA uses games to evaluate the capabilities of Large Language Models in performing quality assurance engineering tasks.
Researchers propose ThinkTwice, a method for optimizing LLMs to improve both their reasoning abilities and their capacity for self-correction.
Vanast introduces a novel approach to virtual try-on technology, using synthetic triplet supervision for realistic human image animation.
This paper investigates and reveals patterns of inefficiency in how AI models utilize external tools for reasoning, moving beyond simple accuracy metrics.