ATLAS

Core Capabilities

What makes ATLAS different

Three interlocking systems that work together to enable genuine adaptive intelligence — not fine-tuning, not prompting, but evolutionary cognition in silicon.

🏛️

Stigmergic Memory Palace

Pheromone-guided navigation through semantic knowledge graphs. CAS decay calibration with 4 adaptive regimes. Cross-session warm-start: memories persist and strengthen with use, fade with neglect — just like biological memory.

🧬

N-Morphic Evolution

Champagnat–Méléard n-morphic framework with k-parallel phenotypic exploration. Lotka–Volterra competition between cognitive branches. InvasionFitnessScorer gates survival. O(1/√T) convergence guarantee on convex landscapes.

Champagnat 2006 · Méléard 2011

⚡

GPU Inference Engine

Pure Rust CUDA kernels — no Python, no ONNX, no dependency hell. Supports SmolLM2-135M through OLMo-3-7B with SWA attention, YaRN context extension, and RoPE scaling. INT8 quantization on the roadmap for 10× improvement.

sm_80 · A100-SXM4 · 61.7 tok/s OLMo-3-7B BF16 (v4.1.0, 4× cuBLAS)

🎛️

Advanced Sampling Engine

Production-grade 7-stage sampling pipeline: repetition penalty → frequency penalty → presence penalty → temperature → top-k → top-p → min-p → sample. OpenAI-compatible API with per-request configuration. Verified: 30/30 unique tokens on A100.

SamplingConfig · v4.0.6 · OpenAI-compatible

Mathematical Foundations

Rigorous theory, not heuristics

Every algorithmic choice in ATLAS is grounded in peer-reviewed mathematics — from stochastic process theory to active inference.

Champagnat n-Morphic Framework (v4.0.0)

InvasionFitnessScorer — f(y) = success − cost − Σ cos_sim · n̄ (Champagnat–Méléard trait selection)
CanonicalPheromoneUpdate — Δρ = ½·μ·σ²·n̄·∂₁s (pheromone as evolutionary gradient)
BarBovier2017Constraints — Stability gate based on 2017 AAP escape rates
CognitiveBranching — n-morphic OODA bifurcation in atlas-astra
HJConcentrationPrior — Hopf–Cole T_eff(s) = T₀/(1 + γs) in atlas-trm

Δρ = ½·μ·σ²·n̄·∂₁s f(y) = success − cost − Σ cos_sim·n̄ T_eff(s) = T₀ / (1 + γ·s)

BF16 GPU Inference Path (v4.0.2 → v4.1.0)

W16A32 Pattern — Weights in BF16 (14 GB VRAM), activations in FP32. Lossless: BF16 = upper 16 bits of FP32 bit pattern
sgemv_bf16_kernel — One-warp-per-row GEMV for N=1 decode. Fixed 32× waste in prior tiled GEMM path
GpuBufBf16 + GpuBufKind — Discriminated union F32/BF16 in atlas-tensor. HashSet tracks BF16 tensors across shards
Result — OLMo-3-7B-Think: 4.1 → 15.4 tok/s (v4.0.2), then 15.4 → 61.7 tok/s (v4.1.0 cuBLAS TF32, 4×). 224/224 BF16 matrices confirmed

W16A32: weights BF16 · activations f32 VRAM: ~14 GB (vs ~28 GB f32) Throughput: 61.7 tok/s (+4× via cuBLAS, was 15.4) Tests: 600/600 passing ✓

OLMo-3-7B: SWA + YaRN (v4.0.1)

Sliding Window Attention — 24 sliding layers (window=4096) + 8 full-attention layers, banded mask via NEG_INFINITY
YaRN Context Extension — 3-band frequency decomposition (low/mid/high). Factor=8, attn_scale_factor=1.2079
Auto-Config Patching — patch_config_from_hf_json() reads layer_types, sliding_window, rope_scaling at load time
Logit Sanity — Spread: 16.803 ✓ · Max prob: 0.96% ✓ · 10/10 unique tokens ✓

Config: 32 layers (24×SWA + 8×full) YaRN: factor=8, attn_factor=1.2079 Load: 103s · Inference: 4.1 tok/s (pre-BF16) Test: gpu_inference_olmo3_quality_sanity ✓

22 Crates

Modular Crate Architecture

Each cognitive function lives in its own crate with clean interfaces. Composable, testable, and independently deployable. 22 crates, 20+ releases in 4 days.

atlas-model

Inference Engine

Transformer runtime with Sliding Window Attention, YaRN RoPE scaling, and multi-architecture support (Llama, OLMo, SmolLM2).

atlas-palace

Stigmergic Store

Pheromone memory with CAS decay calibration, session IDs, PalaceBackend trait, and cross-session warm-start.

atlas-corpus

Training Engine

InvasionFitnessScorer, StigmergicSampler, DeepSupervisionTrainer with N_sup latent carry and loss tracing.

atlas-astra

OODA Loop Engine

Adaptive OODA feedback with explore_ratio [0.1, 0.9], CognitiveBranching for n-morphic phenotype bifurcation.

atlas-trm

Thought Recursion

HJConcentrationPrior implementing Hopf–Cole temperature concentration. Recursive thought tree with configurable depth.

atlas-safety

Safety Constitution

Tractable Horn-clause safety rules — 8 principles across 4 domains. Declarative, auditable, formally verifiable.

atlas-api

HTTP Server

OpenAI-compatible REST + SSE streaming. 40 tests. Drop-in replacement for OpenAI API endpoints. Zero Python.

atlas-mcp

MCP Server

28 tools via JSON-RPC 2.0 on TCP :8765. McpConnectionPool (max 5, 5-min idle). Integrates with Claude & LangChain.

Model	Parameters	Throughput	VRAM	Load Time
SmolLM2-135M	135M	37.7 tok/s	~0.5 GB	2.0s
SmolLM2-360M	360M	25.4 tok/s	~1.4 GB	4.4s
SmolLM2-1.7B	1.7B	12.6 tok/s	~6.5 GB	22.5s
OLMo-3-7B-Think *BF16	7B	61.7 tok/s	~14 GB	103s

Interactive Demo

ATLAS Observatory

Explore the memory palace in 3D, forge thoughts with live LLM inference, watch n-morphic evolution in the arena, and interact with 28 MCP tools — all from your browser.

🏛️

Palace

3D force-directed graph of the memory palace with pheromone flow particles, bloom lighting, and semantic fly-to navigation.

⚒️

Forge

Live chat with OLMo-3-7B-Think via SSE streaming. Token confidence visualization and real-time OODA loop display.

⚔️

Arena

Watch k=1, 2, 4 morphic populations compete in real-time. Branching events, fitness landscapes, and +38% diversity gains.

📚

Library

Interactive K↔L↔1/ρ sliders, λ decay charts, Fleming–Viot diagrams, and the full crate dependency tree.

🔧

Workshop

12 MCP tool cards with live execution, tree/graph result viewers, operation log, and local-first architecture.

🔭 Observatory LIVE on Hugging Face

GPU-accelerated · 5 interactive tabs · 13,659 lines of code · Runs on A10G/L4

What's Next

Release Roadmap

ATLAS evolves continuously. Major versions represent theoretical milestones, not just feature additions.

v4.1.0

Issue #16 Closed · HF Model Published ✓ RELEASED

StigmergicHook + cuBLAS TF32: 15.4 → 61.7 tok/s (4× speedup). atlas-infer crate: StigmergicHook trait (GraphPalace↔ATLAS pheromone bridge), InferEngine, cuBLAS lazy-init. 600/600 tests.

v4.0.8

Anti-Repetition Defaults + Think Budget ✓ RELEASED

Fixed degenerate think loops with proper sampling defaults. SamplingConfig::olmo3() v2 preset. Think budget force-closes runaway <think> blocks after 200 tokens. Extended API: top_k, min_p, repetition_window. 565/565 tests.

v4.0.7

OLMo Post-Norm + QK-Norm Fixes ✓ RELEASED

Fixed 3 critical OLMo-2/3 bugs: post-norm layer ordering (norm(output)+residual), per-head QK-norm weight slicing, CPU path QK-norm. Top-1 token for "capital of France": yp→Paris. CPU/GPU logit agreement within 0.000015. 15.4 tok/s A100 BF16. 562/562 tests.

v4.0.6

Sampling Controls ✓ RELEASED

SamplingConfig 7-stage pipeline: rep_penalty → freq → pres → temp → top_k → top_p → min_p → sample. OpenAI-compatible API parameters. SamplingConfig::olmo3() preset. 30/30 unique tokens verified on A100.

v4.0.5

EOS Stopping + PRNG + ChatML ✓ RELEASED

Issues #13–#15 closed. generate() EOS dead code fixed. XorShift64 deterministic PRNG for reproducible sampling. Correct ChatML template for OLMo-3. 549 tests.

v4.0.4

GPT-4 Regex Tokenizer ✓ RELEASED

Issue #12 closed. OLMo-3 tokenizer mismatch fixed — switched to GPT-4 regex pattern. Correct BPE merges. 539 tests.

v4.0.3

λ Exp Decay Fix + Competition ReLU ✓ RELEASED

Issue #11 closed. CanonicalPheromoneUpdate λ clamped via sigmoid to prevent negative decay. InvasionFitnessScorer τ=0.2 ReLU threshold. 532 tests.

v4.0.2

BF16 GPU Inference Path ✓ RELEASED

W16A32 BF16 path for OLMo-3-7B. sgemv_bf16_kernel. 4.1 → 19.9 tok/s (4.8× speedup). 224/224 BF16 matrices. 528 tests.

v4.0.1

OLMo-3-7B · SWA + YaRN ✓ RELEASED

Full OLMo-3-7B-Think inference with Sliding Window Attention and YaRN context scaling. 528 tests. GPU sanity verified on A100.

v4.0.0

Champagnat N-Morphic Framework ✓ RELEASED

5-part P1–P5 implementation: InvasionFitnessScorer, CanonicalPheromoneUpdate, BarBovier2017Constraints, CognitiveBranching, HJConcentrationPrior. 516 tests.

v3.0.0

GPU Kernels + OpenAI API ✓ RELEASED

Pure Rust CUDA kernels (rmsnorm, rope, silu_mul), GpuVec activation buffers, atlas-api OpenAI-compatible server with SSE streaming. 435 tests.

v4.1.x

Sprint 1 — v5 RFC ▶ IN PROGRESS

All sprint 1 blockers resolved (Issues #12–#16 CLOSED). v4.1.0: StigmergicHook LIVE + cuBLAS 61.7 tok/s. 21 releases in ~5 days. Four proposals from Issue #10 (v5 RFC) underway: P1 PolymorphicTrainer (LoRA adapter morphic switching) · P2 SingularityDetector (eigenvalue singularity gate) · P3 PunctuatedCurriculum (pheromone-triggered epoch transitions) · P9 XSdcIsaSpec (ISA spec for RISC-V SDP core — patent-critical, deadline Dec 6 2026).
ATLAS Observatory ✅ LIVE on Hugging Face Spaces (all 5 tabs). Google.org Grant deadline May 1 (11 days).
HF Model LIVE: openhubresearch/ATLAS-OLMo-3-7B-Think-v4

v4.1.0

INT8 Quantization NEXT

INT8 weight quantization for Linear layers. Target: ~30+ tok/s on OLMo-3-7B (current HBM2e utilization 6% — theoretical ceiling 69 tok/s). GpuMatrix quantized dispatch path.

v5.0.0

ATLAS SDP Hardware Bridge PLANNED

Interface layer for the Stigmergic Dynamical Processor (SDP) FPGA prototype. k=4 parallel phenotypic streams on AMD Versal VCK190. Die estimate: ~1.40 mm² at TSMC 7nm. Provisional Patent #63/932,720 — conversion deadline Dec 6 2026.