Durable long-context AI

Your documents, indexed once —
reused across every chat, agent, and API call.

lab358 makes long-context AI durable: a document's memory is computed once and reused — no repeated prefill, no context-window ceiling, and the cost and latency savings land on every later call. Run it hosted on lab358 Cloud, or deploy it into your own AWS account.

Join the waitlist Talk to us

Hosted on lab358 Cloud, or self-hosted in your own AWS account

Same product, same usage-based pricing — you pick where it runs.
See pricing →

The lab358 console Playground answering a question from an indexed document, with a cache-hit badge and time-to-first-token shown on the message. — The console Playground — answering from an indexed document, with the cache hit and time-to-first-token shown on every message.

HOW IT WORKS

Index once. Retrieve forever.

Your document is measured once — its memory computed and filed. Every later question retrieves just the pieces it needs, with no context-window limit, shared across your whole team.

Schematic — standard attention re-feeds the whole document and re-prefills it into attention on every request; lab358 measures the document once and retrieves only the pieces each question needs, with no fixed context window.

See it work.

A workspace to chat over your documents, an OpenAI-compatible API to build on them, and a usage dashboard that makes the reuse — and the savings — legible.

Tour the platform

OpenAI-compatible · drop-in

from openai import OpenAI

client = OpenAI(
    base_url="https://cloud.lab358.ai/v1",
    api_key="lab358_…",
)

resp = client.chat.completions.create(
    model="lab358/model",
    messages=[{"role": "user", "content": "Summarize the Q3 board deck."}],
    # reuse a document you indexed once
    extra_body={"documents": ["doc_board"]},
)

Built to hold up.

Unbounded context

No context-length ceiling. Long context isn't capped by the model's training window.

Runs on far less GPU

Extreme context served efficiently — long context doesn't demand frontier-scale hardware.

Hosted, or in your own cloud

Run it managed on lab358 Cloud, or deploy it entirely inside your own AWS account — same product, you pick where it runs.

lab358 Cloud

Join the waitlist.

A hosted, usage-based workspace is on the way. Leave your email and we'll reach out the moment lab358 Cloud opens — or see more.

Your data stays yours.

Encrypted per workspace on lab358 Cloud, or entirely inside your own VPC when you self-host — where prompts, documents, and outputs never leave your perimeter.

Built to clear security review Read about security

Make context durable.

Index once. Reuse forever. Hosted on lab358 Cloud, or in your own account.