site:www.marktechpost.com

When Claude Hallucinates in Court: The Latham & Watkins Incident and What It Means for Attorney Liability

There is a particular kind of irony that the legal profession rarely gets to witness in such pristine form. In May 2025, Latham & Watkins a firm that routinely bills over $2,000 an hour for its ...

marktechpost

Alibaba’s Qwen AI Releases Compact Dense Qwen3-VL 4B/8B (Instruct & Thinking) With FP8 Checkpoints

What’s in the release? SKUs and variants: The new additions comprise four dense models— Qwen3-VL-4B and Qwen3-VL-8B, each in Instruct and Thinking editions—alongside FP8 versions of the 4B/8B Instruct ...

marktechpost

Meta Introduces Autodata: An Agentic Framework That Turns AI Models into Autonomous Data Scientists for High-Quality Training Data Creation

The bottleneck in building better AI models has never been compute alone — it has always been data quality. Meta AI’s RAM (Reasoning, Alignment, and Memory) team is now addressing that bottleneck ...

marktechpost

Inworld AI Launches Realtime TTS-2: A Closed-Loop Voice Model That Adapts to How You Actually Talk

Voice AI has a dirty secret: most of it was never designed for conversation. The dominant paradigm — feed text in, get audio out — traces its lineage to audiobook narration and voiceover production, ...

marktechpost

Mistral AI Launches Remote Agents in Vibe and Mistral Medium 3.5 with 77.6% SWE-Bench Verified Score

Mistral AI has been quietly building one of the more practical coding agent ecosystems in the open-source/weights AI space, and they are shipping its most significant infrastructure upgrade yet.

marktechpost

AI Agents

A new memory framework from Google Cloud AI Research and UIUC gives LLM agents the ability to distill generalizable reasoning strategies from both successful and failed experiences — and combines that ...

marktechpost

Open Source

Mistral AI's latest release brings async cloud-based coding sessions, a new 128B flagship model, and an agentic Work mode to Le Chat — a meaningful step forward for developers building with AI agents.

marktechpost

The 20 Hottest Agentic AI Tools And Agents Of 2025 (So Far)

Shobha is a data analyst with a proven track record of developing innovative machine-learning solutions that drive business value.

marktechpost

OpenAI Releases HealthBench: An Open-Source Benchmark for Measuring the Performance and Safety of Large Language Models in Healthcare

OpenAI has released HealthBench, an open-source evaluation framework designed to measure the performance and safety of large language models (LLMs) in realistic healthcare scenarios. Developed in ...

marktechpost

What is Agentic RAG? Use Cases and Top Agentic RAG Tools (2025)

Agentic RAG combines the strengths of traditional RAG—where large language models (LLMs) retrieve and ground outputs in external context—with agentic decision-making and tool use. Unlike static ...

marktechpost

Hugging Face Releases Open LLM Leaderboard 2: A Major Upgrade Featuring Tougher Benchmarks, Fairer Scoring, and Enhanced Community Collaboration for Evaluating Language Models

Over the past year, the original Open LLM Leaderboard became a pivotal resource in the machine learning community, attracting over 2 million unique visitors and engaging 300,000 active monthly users.

marktechpost

Getting Started with MLFlow for LLM Evaluation

MLflow is a powerful open-source platform for managing the machine learning lifecycle. While it’s traditionally used for tracking model experiments, logging parameters, and managing deployments, ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results