SubQ claims breakthrough bottleneck for LLMs

By Alexander ColeJUN 19, 20263 min read

A Miami startup says it has hacked a decade-old bottleneck slowing every large language model.

Subquadratic, which emerged from stealth mode last month, claims its SubQ family is faster, cheaper, and far more energy efficient than existing models. The company says SubQ can process up to 12 times as much text at once, enabling tasks that require scouring hundreds of documents or entire code bases. The team reports that SubQ roughly matches the coding prowess of models from Google DeepMind, OpenAI, and Anthropic on key tasks, even as it uses a fraction of the energy and cost of rival systems.

The early narrative around Subquadratic was skeptical by design. The company initially shared only a handful of self-published scores, prompting comparisons to a high-stakes boilerplate pitch. As the rumor mill churned, an influential chorus warned that it could be AI Theranos in disguise. A month later, Subquadratic began to publish more data, including results from independent evaluations conducted by third-party firm Appen. The arrival of outside benchmarking helped move the debate from speculative to testable, even as questions remain about how broadly SubQ will scale and how it performs across a wider array of real-world workloads.

The company says it has solved a mathematical bottleneck that has constrained LLM throughput for years. If SubQ truly delivers on its promises, the implications for data centers and enterprise deployment would be meaningful: faster inference at lower energy cost could tilt the economics of model serving, potentially changing where and how teams allocate compute for shipping products that depend on language intelligence. Subquadratic frames the breakthrough as an architectural and efficiency win, not merely a bigger model with more parameters.

Industry observers will be looking closely at how the independent Appen tests align with internal results. Third-party verification matters in this space, where a model can post strong single-scenario numbers but stumble when faced with diverse document sets, real code bases, or longer contextual horizons. The team says healthy skepticism was expected, and that continued external scrutiny will be essential as more users try SubQ in the wild and as the company weighs wider availability.

From a practitioner’s standpoint, two big takeaways stand out. First, if the 12x throughput claim holds under broader testing, operators will need new guidance on how to structure deployment. Throughput and latency behave differently in confederations of GPUs, CPUs, and memory bandwidth, so a scaling plan that assumes a straight line from speed to users would be risky even with a more efficient core. Second, independent benchmarking is not optional anymore. In the wake of mixed signals around breakthrough claims, Appen-like corroboration will be a gatekeeper for enterprise adoption, especially where compliance, safety, and reproducibility matter.

There are important caveats to watch. SubQ is not yet widely available for trial by customers, and the absence of disclosed parameter counts leaves a key detail out of readers’ hands. Until more teams can run SubQ on their own data and workloads, the industry will remain in a wait-and-see phase, balancing the potential cost savings against the risk of overhyping a single set of numbers.

If Subquadratic can prove the independent results and deliver on a broader set of benchmarks, the model would stand as a rare case where a claimed bottleneck break translates into practical value rather than a headline. In the meantime, the industry will keep a close eye on how these early signals translate into real-world performance and how customers choose to deploy a model that promises to change the economics of language AI.

SubQ claims breakthrough bottleneck for LLMs

The Robotics Briefing