AI News — Thursday, April 9, 2026

9
Video-MME-v2: A New Benchmark for Comprehensive Video Understanding

Researchers introduce Video-MME-v2, a significant benchmark designed to push the boundaries of comprehensive video understanding models.

Hugging Faceresearch
8
Claw-Eval: Towards Trustworthy Evaluation of Autonomous Agents

A new framework, Claw-Eval, is proposed to provide a more trustworthy and robust evaluation methodology for autonomous AI agents.

Hugging Faceresearch
8
AWS Boss Explains Why Investing Billions in Both Anthropic and OpenAI is an OK Conflict

The head of AWS clarifies the strategic rationale behind Amazon's substantial investments in both Anthropic and OpenAI, addressing potential conflicts of interest.

TechCrunchindustry
8
MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

MegaTrain presents a breakthrough in training massive language models with over 100 billion parameters in full precision on just a single GPU.

Hugging Faceresearch
8
General Multimodal Protein Design Enables DNA-Encoding of Chemistry

A new multimodal AI system demonstrates the ability to design proteins and translate chemical information into DNA sequences.

Hugging Faceresearch
7
Learning to Retrieve from Agent Trajectories

A new research paper explores methods for AI agents to learn and retrieve information effectively from their past trajectories, improving decision-making.

Hugging Faceresearch
7
AIMock: One Mock Server For Your Entire AI Stack

AIMock is introduced as a versatile mock server designed to streamline development and testing across an entire AI application stack.

Dev.toproduct
7
AI Agent Poke Makes Setting Up Automations as Easy as Sending a Text

Poke is a new AI agent that simplifies the creation of automations, allowing users to set up complex tasks with simple text commands.

TechCrunchproduct
7
Tubi is the First Streamer to Launch a Native App Within ChatGPT

Tubi becomes the first streaming service to integrate directly into ChatGPT with a native application, allowing users to access content recommendations and playback within the AI interface.

TechCrunchproduct
7
Paper Circle: An Open-source Multi-agent Research Discovery and Analysis Framework

Paper Circle is introduced as an open-source framework that leverages multiple AI agents to assist in the discovery and analysis of research papers.

Hugging Faceopen-source
6
ACES: Leave-One-Out AUC Consistency for Code Generation Evaluation

A new evaluation metric, ACES, is presented to assess the consistency and reliability of code generation models by testing their tests.

Hugging Faceresearch
6
GBQA: A Game Benchmark for Evaluating LLMs as Quality Assurance Engineers

A new benchmark called GBQA uses games to evaluate the capabilities of Large Language Models in performing quality assurance engineering tasks.

Hugging Faceresearch
6
ThinkTwice: Jointly Optimizing Large Language Models for Reasoning and Self-Refinement

Researchers propose ThinkTwice, a method for optimizing LLMs to improve both their reasoning abilities and their capacity for self-correction.

Hugging Faceresearch
6
Vanast: Virtual Try-On with Human Image Animation via Synthetic Triplet Supervision

Vanast introduces a novel approach to virtual try-on technology, using synthetic triplet supervision for realistic human image animation.

Hugging Faceresearch
5
Beyond Accuracy: Unveiling Inefficiency Patterns in Tool-Integrated Reasoning

This paper investigates and reveals patterns of inefficiency in how AI models utilize external tools for reasoning, moving beyond simple accuracy metrics.

Hugging Faceresearch