AI & Machine Learning
Research, model releases, benchmarks, and deployments shaping intelligent machines.
Latest AI & Machine Learning
NVIDIA Blackwell tops STAC-AI LLM record in finance
Blackwell smashed STAC-AI's finance LLM speed records. NVIDIA says its Blackwell accelerators pushed a new high for large language model inference in finance on the STAC-AI benchmark, underscoring how modern hardware is turning unstructured data into actionable market signals at scale. The team repo
OpenAI Unveils Playbook for Trustworthy Evaluations
OpenAI just published a shared playbook to standardize third party AI evaluations. The paper shows a structured approach to judging frontier models along three axes: capabilities, safeguards, and validity. It outlines how to design assessment datasets, risk benchmarks, and how to document evaluation
Versioned tests finally make agent evaluation reliable
Versioned tests finally make agent evaluation reliable. A single benchmark no longer suffices when agents learn and drift in production, AWS argues, thanks to a new approach that marries fast moving online signals with stable offline baselines using dataset management in Amazon Bedrock AgentCore. Th
Azerbaijani LLM on SageMaker AI Delivers Gains
Six weeks, a 23 percent throughput bump, and 58 percent GPU memory saved redefine Azerbaijani LLM training. Azercell Telecom LLC, Azerbaijan’s leading telecommunications provider, teamed up with AWS to push the boundaries of language models for a morphologically rich, low resource language. The goal
Open source AI tool unifies fragmented enterprise data
AI-powered analytics finally connects fragmented data without SQL. Data Formulator 0.7, from Microsoft Research, is an open-source system that blends data connectivity, agent guided exploration, and visualization refinement in a shared workspace. It targets a common enterprise pain: data sits in dat
AI Reimagines IVF with Robotic Labs and Selection
IVF has already created millions of families, but the process remains slow, painful, and expensive. Today, MIT Technology Review reports a wave of AI driven advances that promise to change that: AI systems to identify promising sperm and embryos, robotic platforms that could automate parts of the la
RULER exposes hidden memory in machine unlearning
RULER finds hidden memory lingering in 10 of 12 unlearning tests. A new paper introduces RULER, a suite of representation level verification metrics designed to test whether removing training data truly erases its imprint on a model. The project targets a gap in today’s unlearning protocols, which t
Frozen LLM Beats Fine-Tuned Models at Causal Discovery
A frozen language model beat trained rivals at causal discovery. Researchers asked whether large language models can reliably infer causal graphs from data, and the answer is blunt. Fine-tuning and in-context tricks do not fix the core problem. They prove a kernel obstruction theorem showing that su

AI Automates IVF Clinics, Signals New Era
MIT Technology Review's latest look at AI's practical bite shows the shift from flashy demos to real world automation in a high stakes medical setting. Researchers are layering AI to identify promising sperm and embryos, and they are building robotic systems to take over laborious, repetitive steps
Agentic AI Upends Enterprise Org Design
Eighty-five percent want agentic AI in three years, but 76 percent aren't ready. Enterprise leaders are trying to harness AI agents to run end to end, yet a widening gap between ambition and execution is surfacing, according to industry observers. The refrain is simple: teams want agents that can co

AI Jobs Panic Debunked by Data
The AI jobs panic is overhyped, new data show. A reality check from the latest analysis of US labor data suggests there hasn’t been a mass surge of unemployment in occupations most exposed to artificial intelligence. In fact, unemployment in AI exposed roles is lower than in less exposed jobs, and t
7B Model Outsmarts Bigger LMs in Lean Proofs
A 7B parameter model outshines giants in Lean proof optimization. ImProver 2 lays out a compelling case that neurosymbolic AI can turn small models into serious proof engines. The paper introduces a Lean 4 oriented framework that blends data efficient expert iteration with a scaffold that couples fo
Energy per Successful Goal Reframes AI Power Use
Agentic AI workflows burn 4.33 times more energy per goal than linear runs. The paper introduces a new way to count energy in AI systems that orchestrate multiple steps, call tools, retry failed attempts, and recover from errors. Instead of measuring energy per inference or per training pass, the au
Zero Cost Attribution for AI Toolchains
BOHM reads routing weights in AI toolchains and attributes decisions at zero cost. The paper introduces a hierarchical attribution method that works with compound AI systems by tracing how tasks travel through a tree of specialized components, without peeking under the hood of each module or paying
Scaling Creativity in the AI Era
AI is no longer optional for content and audiences now binge 12 hours of video daily. The numbers tell the story before the story: producing original material at scale is expensive and demanding. A Hollywood feature runs on a baseline budget of about $150 million, and studios watch roughly $1 millio
AI for science vs agentic hype at Google I/O
WeatherNext warned Jamaica about a catastrophic hurricane, yet the talk of the AI singularity at Google I/O still feels aspirational. At the core of Google I/O this year was a stark contrast. Demis Hassabis framed the moment as a foreshadowing of a future where AI tools accelerate scientific discove
Diffusion LMs Promise Faster Text Generation
Nemotron-Labs Diffusion Language Models push a bold alternative to autoregressive generation by producing multiple tokens at once and then refining them step by step. Traditional AR models feed one token at a time, loading weights and waiting for each pass before the next character appears. The resu

Specialization Beats Scale in Enterprise OCR
A 3-billion-parameter specialized model beat every frontier API, and it costs about fifty times less. Dharma’s April release of DharmaOCR marks a pivot in enterprise AI: for structured OCR tasks, a small, tightly tuned model can outperform the big, multi-domain frontier APIs while slashing inference
Briefing
