Lab358 — AI Research
The Next Architecture for Language Models
A patented autoregressive language model with O(n log n) inference. Same quality as a Transformer at fundamentally lower compute cost — with unbounded context and an end-to-end learned retrieval mechanism.
The Problem
A three-way tradeoff in modern LLMs
Every current architecture forces a compromise between quality, inference cost, and the ability to learn its own retrieval. Lab358 resolves all three.
O(n²)
Transformers
State-of-the-art quality, but inference cost grows quadratically with context length. Production-scale serving is dominated by attention compute.
O(n)
State-space models
Mamba, Griffin, RWKV achieve linear inference by compressing history into a fixed-size state — a lossy bottleneck that bounds long-range recall.
External
RAG / bolt-on retrieval
External vector stores extend context, but the base model never optimizes for the retrieval system — accuracy depends on a separately-tuned pipeline.
Lab358's architecture is sub-quadratic, unbounded, and end-to-end learned — a single model that satisfies all three constraints under one cross-entropy loss.
Results
Transformer-equivalent quality at sub-quadratic cost
A 130M-parameter proof-of-concept trained on a 10B-token dataset, evaluated against GPT-2 Small (124M, ~40B tokens).
| Benchmark | Lab358 130M | GPT-2 Small 124M |
|---|---|---|
| HellaSwag | 0.308 | 0.311 |
| ARC-Easy | 0.436 | 0.438 |
| MMLU | 0.230 | 0.229 |
Training efficiency
Matches GPT-2 Small on HellaSwag, ARC-Easy, and MMLU after just 2 epochs of a 10B-token dataset — roughly a quarter of the data GPT-2 trained on.
Architecture
Four properties, one model
A single architecture that is faster than a Transformer, retains more than a state-space model, and learns its own retrieval end-to-end.
Sub-quadratic complexity
O(n log n) inference scaling. Self-attention is replaced by a fully differentiable retrieval mechanism trained jointly with the language model under a single cross-entropy loss.
Unbounded context
No fixed-state information bottleneck. The retrieval mechanism gives the model open-ended access to its history without forcing compression into a constant-size hidden state.
Deployment-time tuning
The same trained model exposes a tunable retrieval breadth k — operators trade accuracy for inference speed at serving time without retraining.
Drop-in compatible
Standard tokenizers, standard training corpora, standard evaluation harnesses. No separate retrieval objective, no curriculum, no multi-phase training schedule.
Why It Matters
Inference is the spend
$100B+
Annual global spend on AI inference
Inference > training
Serving cost, not training cost, is the dominant lifetime expense for any deployed model
Architectural
A complexity-class change is higher leverage than incremental hardware or quantization gains
Get in touch
Researchers, investors, and acquirers welcome
Reach out directly to discuss the architecture, the IP, or potential collaboration.
michael@lab358.ai