Skip to main content
Lab358 Logo

Lab358 — AI Research

The Next Architecture for Language Models

A patented autoregressive language model with O(n log n) inference. Same quality as a Transformer at fundamentally lower compute cost — with unbounded context and an end-to-end learned retrieval mechanism.

The Problem

A three-way tradeoff in modern LLMs

Every current architecture forces a compromise between quality, inference cost, and the ability to learn its own retrieval. Lab358 resolves all three.

O(n²)

Transformers

State-of-the-art quality, but inference cost grows quadratically with context length. Production-scale serving is dominated by attention compute.

O(n)

State-space models

Mamba, Griffin, RWKV achieve linear inference by compressing history into a fixed-size state — a lossy bottleneck that bounds long-range recall.

External

RAG / bolt-on retrieval

External vector stores extend context, but the base model never optimizes for the retrieval system — accuracy depends on a separately-tuned pipeline.

Lab358's architecture is sub-quadratic, unbounded, and end-to-end learned — a single model that satisfies all three constraints under one cross-entropy loss.

Results

Transformer-equivalent quality at sub-quadratic cost

A 130M-parameter proof-of-concept trained on a 10B-token dataset, evaluated against GPT-2 Small (124M, ~40B tokens).

BenchmarkLab358 130MGPT-2 Small 124M
HellaSwag0.3080.311
ARC-Easy0.4360.438
MMLU0.2300.229

Training efficiency

Matches GPT-2 Small on HellaSwag, ARC-Easy, and MMLU after just 2 epochs of a 10B-token dataset — roughly a quarter of the data GPT-2 trained on.

Architecture

Four properties, one model

A single architecture that is faster than a Transformer, retains more than a state-space model, and learns its own retrieval end-to-end.

Sub-quadratic complexity

O(n log n) inference scaling. Self-attention is replaced by a fully differentiable retrieval mechanism trained jointly with the language model under a single cross-entropy loss.

Unbounded context

No fixed-state information bottleneck. The retrieval mechanism gives the model open-ended access to its history without forcing compression into a constant-size hidden state.

Deployment-time tuning

The same trained model exposes a tunable retrieval breadth k — operators trade accuracy for inference speed at serving time without retraining.

Drop-in compatible

Standard tokenizers, standard training corpora, standard evaluation harnesses. No separate retrieval objective, no curriculum, no multi-phase training schedule.

Why It Matters

Inference is the spend

$100B+

Annual global spend on AI inference

Inference > training

Serving cost, not training cost, is the dominant lifetime expense for any deployed model

Architectural

A complexity-class change is higher leverage than incremental hardware or quantization gains

Get in touch

Researchers, investors, and acquirers welcome

Reach out directly to discuss the architecture, the IP, or potential collaboration.

michael@lab358.ai