For technically-aware investors

Built, not wrapped.

NIA - the engine under Omea.

NIA is a 350-billion-parameter Mixture-of-Experts model we built from zero on a custom narrative corpus anchored by 155,000+ licensed novels and other curated sources. The full STT → LLM → TTS path runs end-to-end on bare-metal Blackwell B200s, walled behind our own FastAPI. One engineered system, not orchestrated agents. No OpenAI, no Anthropic, no Gemini anywhere in the hot path. We optimized for emotional intelligence - the axis the frontier labs treat as an afterthought.

Max Salamonowicz · CEO & NIA architect · AImmersive / Omea max@aimmersive.ai

Section 02 · The Stack

End-to-end, on metal we control.

Three layers, one path, one piece of hardware. Everything between the player's microphone and the AI-generated audio output is code and weights that we own and run.

STT

Speech-to-Text Self-hosted Whisper or NVIDIA Parakeet (choice per workload). Both open-source. Both running on our bare metal.

open-source · self-hosted

LLM

NIA - Narrative Intelligence Architecture 350B-parameter Mixture-of-Experts. Built from zero on a custom narrative corpus that includes 155,000+ licensed novels and other curated sources. Runs on a custom SGLang fork tuned to the bone for B200.

custom · ours

TTS

Maya - Text-to-Speech Triton + TensorRT-optimized. RTF 0.22 on a single B200. 100 concurrent voice streams. BF16 precision.

custom · ours

Foundation: bare-metal Blackwell B200 · Gateway: our FastAPI · External providers in the hot path: zero

One engineered system, not orchestrated agents

NIA isn't "agentic AI" - not dozens of LLM calls stitched together with prompts, fallback chains, and hope. It's a single purpose-built system: model, custom tokenizer, state machine, serving stack - all designed and built as one piece. The output isn't a stitched approximation of coherence. It's coherent by construction.

The deeper technical story - SGLang fork, EAGLE speculation, FlashInfer version hell, the Blackwell-specific kernel discoveries - is documented in the founder's Skunkworks deep-dive.

Section 03 · NIA in numbers

The receipts.

Four numbers worth memorizing. The rest live in the technical write-up.

350B

parameters · MoE

Built from zero

155K+

licensed novels in the corpus

Plus other curated sources. Clean dataset, no copyright cloud.

5×B200

serves 1,100 concurrent

4× for NIA, 1× for Maya TTS

~30×

scale headroom

3.3% avg capacity used in demo

Read more about our technical struggles - the Skunkworks article walks through the full inference-stack journey, including the +23.7% TTFT win from MoE shared-expert fusion and why FP8 KV cache turned out to be slower than BF16 on Blackwell.

Section 04 · The architectural argument

We optimized for EQ.
The frontier labs optimize for IQ.

Frontier LLMs - Claude Opus, GPT-class, Gemini Ultra - compete on MMLU, HumanEval, GPQA. Pure IQ benchmarks: reasoning, code, math, world knowledge.

NIA was built against a different objective entirely - from corpus and tokenizer to architecture and serving. Narrative coherence. Character continuity. Emotional pacing. Tonal control across long arcs. These don't exist as public benchmarks because the field treats emotional intelligence as an emergent downstream property of scale. We treated it as a primary objective end-to-end.

The proof is in deployment, not in argument: our first B2B customer (in HR) compared NIA head-to-head against the frontier models and chose NIA - explicitly on the EQ axis, where SOTA's deep reasoning didn't translate to the human-facing scenarios they needed to handle.

What this means in practice

NIA wins when the use case is: an extended human-facing conversation, with consistent persona and emotional state, where the agent has to respond like a person who has been listening, not retrieve and reason. Storytelling. HR. Coaching. Training. Anywhere the conversation matters more than the answer.

Section 05 · Liquid Memory

State, not retrieval.

What every other AI system calls "memory" today is RAG - retrieve relevant chunks of past context, paste them back into the prompt, hope the model uses them. The model itself stays stateless between turns. Memory is bolted on, not architectural.

Liquid Memory is different. NIA tracks emotional history, broken promises, character relationships, and structural narrative state across tens of millions of tokens within a single session - architecturally, not by retrieval. Betray an NPC in chapter one and they remember you in chapter five. Not because the right chunk got pulled back into the prompt. Because the model carries the state.

Under the hood

Liquid Memory is backed by a pseudo-state non-deterministic machine paired with a custom tokenizer purpose-built for narrative. Standard frontier-LLM tokenizers are optimized to encode literal text efficiently. Ours encodes what's said between the lines - subtext, implication, broken promises, emotional weight. The state machine reads those signals and updates persistent narrative state turn by turn.

This is also the single biggest reason a generic LLM cannot do what NIA does - even at frontier scale. SOTA models were trained against tokenizers optimized for different objectives; there's no surface for our state machine to attach to. RAG-class systems patch over the absence of long-form coherence; they don't produce it.

Max walked through Liquid Memory and the broader design choices in a long-form exclusive with Korea Game Desk - their writeup is the closest thing to a public architecture note we've put out.

Section 06 · The economics

Why bare metal isn't ideology.
It's math.

Every AI gaming, narrative, or character-AI company you'll see this year is built on third-party APIs - OpenAI, Anthropic, Google. Their margin gets taxed on every single user interaction. Their cost-per-interaction floor is set by someone else's API price list.

NIA on bare metal is the opposite. We pay for compute. The per-token margin stays inside the business.

Frontier LLM via API

Variable cost on every interaction
Margin taxed by model provider
Capacity gated by provider's rate limits
Wrong tool for our latency budget

NIA on bare metal

Fixed-cost compute, linear scale
Per-token margin stays in the business
Capacity gated only by our GPU footprint
Optimized end-to-end for the use case

The 80× math

Even if Claude Opus could meet our latency budget on this workload - it can't - the per-token cost would be roughly 80× higher than what we pay on bare metal. The use case literally doesn't exist on a frontier-API stack. That's not an opinion about our moat. That's the cost sheet.

Plus: ~30× scale headroom

Demo period averaged 3.3% server capacity used across 5× B200. We can roughly 30× the concurrent player count on the same hardware before provisioning new GPUs. Cost per concurrent user at this scale is below anything running on rented inference.

Section 07 · Where NIA lives today

Order of operations.

Primary focus is the Omea consumer launch on Steam and App Store - that's where the company's energy is right now. B2B is real and earning revenue, but it grows alongside the consumer launch rather than ahead of it.

B2C · Primary

Omea Our flagship narrative gaming platform. 66-day public demo just closed - ~1M words spoken to the narrator, 1.3B tokens of story produced. UK reviewer Stoffel Presents called it "mind-blowing" and "potentially groundbreaking". Steam Next Fest June 15-22.

launching

B2B · HR

HR partner pilot First B2B revenue earned ($10.2K) against the NIA model. They evaluated frontier LLMs head-to-head and chose NIA on the EQ axis.

in production

B2B · EdTech

One partnership in conversation Tutoring / role-play use case where emotional continuity matters more than benchmark IQ. Focus remains on Omea launch first.

exploratory

How partners access NIA

Hosted on our bare metal, walled behind FastAPI. Per-token pricing, contract per partner. Not yet open / self-serve - by design: capacity is managed, partners are vetted, the model is not a commodity endpoint.

Built, not wrapped.

NIA - the engine under Omea.

End-to-end, on metal we control.

One engineered system, not orchestrated agents

The receipts.

We optimized for EQ.
The frontier labs optimize for IQ.

What this means in practice

State, not retrieval.

Under the hood

Why bare metal isn't ideology.
It's math.

Frontier LLM via API

NIA on bare metal

The 80× math

Plus: ~30× scale headroom

Order of operations.

How partners access NIA

One reply moves it forward.

This brief has been retired.

Built, not wrapped.

NIA - the engine under Omea.

End-to-end, on metal we control.

One engineered system, not orchestrated agents

The receipts.

We optimized for EQ.The frontier labs optimize for IQ.

What this means in practice

State, not retrieval.

Under the hood

Why bare metal isn't ideology.It's math.

Frontier LLM via API

NIA on bare metal

The 80× math

Plus: ~30× scale headroom

Order of operations.

How partners access NIA

One reply moves it forward.

This brief has been retired.

We optimized for EQ.
The frontier labs optimize for IQ.

Why bare metal isn't ideology.
It's math.