OpenAI Aims for Fully Automated AI Researcher by 2028

OpenAI plans to build a fully autonomous AI researcher that can chase big questions without human prompts.

The company has outlined a multi-year “north star” focused on agent-based research automation. By September, it intends to deliver an autonomous AI research intern capable of tackling a small set of research problems on its own, a precursor to a fully automated multi-agent system slated to debut in 2028. The plan, disclosed in an exclusive interview with chief scientist Jakub Pachocki, signals a shift from prompting ever-smarter models to architecting systems that can design, run, and interpret experiments with limited human intervention.

If successful, the move could compress the time from hypothesis to insight by orders of magnitude. OpenAI envisions a stack of coordinated agents: planners, experiment runners, data analyzers, and risk monitors that can operate in concert across domains—from physics to biology to economics. In practice, that means less hand-holding of every step and more autonomous iteration: the system conceives experiments, executes simulations or live tests where safe, analyzes results, and surfaces next steps. It’s the sort of leap that would push AI research from “assisted by humans” to “led by machines,” at least for defined problem spaces.

But the ambition carries hefty implications for compute, data, and governance. Agent-based systems demand robust orchestration across tools, environments, and validation channels. They require careful design to prevent runaway experimentation, misinterpretation of results, or biased data loops from seeping into conclusions. The paper-trail here isn’t just about smarter APIs; it’s about building a reliable, auditable, and safety-conscious research engine that can justify its conclusions to human reviewers.

The timing also intersects with a broader cautionary note the same week: even as AI accelerates research, some scientific frontiers remain stubbornly difficult to study with current methods. MIT Technology Review highlights that psychedelic drugs—psilocybin and related compounds—are experiencing a surge of interest across depression, PTSD, addiction, and obesity. Yet two studies published recently emphasize how hard it is to draw clean conclusions in this domain, underscoring that technology alone isn’t a silver bullet for every research bottleneck. In other words, the automation stack can speed exploration, but it won’t replace the need for rigorous study design, careful data interpretation, and domain-specific safeguards—especially in areas with high clinical stakes.

For practitioners, the launch offers both promise and peril. Here are 2–4 concrete takeaways to watch as OpenAI’s initiative unfolds:

Compute and data costs will be a gating factor. An autonomous, multi-agent research engine will require persistent, low-latency access to diverse data streams, environments, and simulators. Expect a steep bill for compute, plus sophisticated data curation and provenance tooling to keep experiments reproducible.

Evaluation must be built in at every layer. “Autonomous” won’t mean correct by default. You’ll need multi-faceted evaluation: replication of results, sanity checks across model biases, and human-in-the-loop verification for high-stakes domains. The risk of blurring correlation and causation grows as automation scales.

Safety, governance, and sandboxing matter as much as capability. The more autonomy you give an AI researcher, the more you need guardrails, audit trails, and external review processes to prevent unsafe or unethical experiments from slipping through.

Early pilots will determine real-world payoff. In the near term, look for internal labs and selective partnerships testing the intern’s ability on narrow, well-bounded tasks. The quarter’s signal will be whether these pilots demonstrate reliable planning, experiment setup, and result interpretation without constant human nudges.

Analogy helps: this is like assigning a self-driving science notebook to a research team—one that can draft plans, run simulations or controlled experiments, and pull in results, but still needs a supervising driver, a map of safety rules, and a trusted checklist to avoid veering into misinterpretation or unsafe territory. If the guardrails hold, the lab could scale its exploratory velocity dramatically; if not, it risks amplifying erroneous conclusions just as fast as it accelerates discovery.

The big practical question for product teams this quarter isn’t “Will this replace researchers?” but “Where will automation add reliable value first, and how will we govern it?” Expect OpenAI to roll out cautious pilots, with emphasis on evaluation, safety protocols, and governance frameworks before any broader operational use in real-world research programs.

Sources

The Download: OpenAI is building a fully automated researcher, and a psychedelic trial blind spot

OpenAI Aims for Fully Automated AI Researcher by 2028

Sources

The Robotics Briefing