v4.0.8  ·  Apache 2.0  ·  565 Tests Pass

ATLAS

Active-inference Training with Learned Adaptive Stigmergy

A Rust-native AGI training framework fusing stigmergic memory, n-morphic evolution, and GPU-accelerated inference.

565
Tests · 0 Failures
15.4
tok/s · OLMo-3-7B BF16
21
Crates · Pure Rust
v4.0.8
Apache 2.0

What makes ATLAS different

Three interlocking systems that work together to enable genuine adaptive intelligence — not fine-tuning, not prompting, but evolutionary cognition in silicon.

🏛️

Stigmergic Memory Palace

Pheromone-guided navigation through semantic knowledge graphs. CAS decay calibration with 4 adaptive regimes. Cross-session warm-start: memories persist and strengthen with use, fade with neglect — just like biological memory.

Powered by GraphPalace
🧬

N-Morphic Evolution

Champagnat–Méléard n-morphic framework with k-parallel phenotypic exploration. Lotka–Volterra competition between cognitive branches. InvasionFitnessScorer gates survival. O(1/√T) convergence guarantee on convex landscapes.

Champagnat 2006 · Méléard 2011

GPU Inference Engine

Pure Rust CUDA kernels — no Python, no ONNX, no dependency hell. Supports SmolLM2-135M through OLMo-3-7B with SWA attention, YaRN context extension, and RoPE scaling. INT8 quantization on the roadmap for 10× improvement.

sm_80 · A100-SXM4 · 15.4 tok/s OLMo-3-7B BF16 (v4.0.8)
🎛️

Advanced Sampling Engine

Production-grade 7-stage sampling pipeline: repetition penalty → frequency penalty → presence penalty → temperature → top-k → top-p → min-p → sample. OpenAI-compatible API with per-request configuration. Verified: 30/30 unique tokens on A100.

SamplingConfig · v4.0.6 · OpenAI-compatible

Rigorous theory, not heuristics

Every algorithmic choice in ATLAS is grounded in peer-reviewed mathematics — from stochastic process theory to active inference.

Champagnat n-Morphic Framework (v4.0.0)
  • InvasionFitnessScorer — f(y) = success − cost − Σ cos_sim · n̄ (Champagnat–Méléard trait selection)
  • CanonicalPheromoneUpdate — Δρ = ½·μ·σ²·n̄·∂₁s (pheromone as evolutionary gradient)
  • BarBovier2017Constraints — Stability gate based on 2017 AAP escape rates
  • CognitiveBranching — n-morphic OODA bifurcation in atlas-astra
  • HJConcentrationPrior — Hopf–Cole T_eff(s) = T₀/(1 + γs) in atlas-trm
Δρ = ½·μ·σ²·n̄·∂₁s f(y) = success − cost − Σ cos_sim·n̄ T_eff(s) = T₀ / (1 + γ·s)
BF16 GPU Inference Path (v4.0.2 → v4.0.8)
  • W16A32 Pattern — Weights in BF16 (14 GB VRAM), activations in FP32. Lossless: BF16 = upper 16 bits of FP32 bit pattern
  • sgemv_bf16_kernel — One-warp-per-row GEMV for N=1 decode. Fixed 32× waste in prior tiled GEMM path
  • GpuBufBf16 + GpuBufKind — Discriminated union F32/BF16 in atlas-tensor. HashSet tracks BF16 tensors across shards
  • Result — OLMo-3-7B-Think: 4.1 → 15.4 tok/s (3.75× speedup). 224/224 BF16 matrices confirmed
W16A32: weights BF16 · activations f32 VRAM: ~14 GB (vs ~28 GB f32) Throughput: 15.4 tok/s (+3.75×) Tests: 565/565 passing ✓
OLMo-3-7B: SWA + YaRN (v4.0.1)
  • Sliding Window Attention — 24 sliding layers (window=4096) + 8 full-attention layers, banded mask via NEG_INFINITY
  • YaRN Context Extension — 3-band frequency decomposition (low/mid/high). Factor=8, attn_scale_factor=1.2079
  • Auto-Config Patching — patch_config_from_hf_json() reads layer_types, sliding_window, rope_scaling at load time
  • Logit Sanity — Spread: 16.803 ✓ · Max prob: 0.96% ✓ · 10/10 unique tokens ✓
Config: 32 layers (24×SWA + 8×full) YaRN: factor=8, attn_factor=1.2079 Load: 103s · Inference: 4.1 tok/s (pre-BF16) Test: gpu_inference_olmo3_quality_sanity ✓

Modular Crate Architecture

Each cognitive function lives in its own crate with clean interfaces. Composable, testable, and independently deployable.

atlas-model
Inference Engine
Transformer runtime with Sliding Window Attention, YaRN RoPE scaling, and multi-architecture support (Llama, OLMo, SmolLM2).
atlas-palace
Stigmergic Store
Pheromone memory with CAS decay calibration, session IDs, PalaceBackend trait, and cross-session warm-start.
atlas-corpus
Training Engine
InvasionFitnessScorer, StigmergicSampler, DeepSupervisionTrainer with N_sup latent carry and loss tracing.
atlas-astra
OODA Loop Engine
Adaptive OODA feedback with explore_ratio [0.1, 0.9], CognitiveBranching for n-morphic phenotype bifurcation.
atlas-trm
Thought Recursion
HJConcentrationPrior implementing Hopf–Cole temperature concentration. Recursive thought tree with configurable depth.
atlas-safety
Safety Constitution
Tractable Horn-clause safety rules — 8 principles across 4 domains. Declarative, auditable, formally verifiable.
atlas-api
HTTP Server
OpenAI-compatible REST + SSE streaming. 40 tests. Drop-in replacement for OpenAI API endpoints. Zero Python.
atlas-mcp
MCP Server
28 tools via JSON-RPC 2.0 on TCP :8765. McpConnectionPool (max 5, 5-min idle). Integrates with Claude & LangChain.

GPU Benchmark Results

Measured on NVIDIA A100-SXM4-40GB with CUDA 13.0, sm_80 kernels. FP32 weights, 30-token generation runs, release build.

Inference Throughput
Hardware: NVIDIA A100-SXM4-40GB  ·  CUDA: 13.0  ·  Arch: sm_80  ·  Precision: FP32 / BF16*
Model Parameters Throughput VRAM Load Time
SmolLM2-135M 135M
37.7 tok/s
~0.5 GB 2.0s
SmolLM2-360M 360M
25.4 tok/s
~1.4 GB 4.4s
SmolLM2-1.7B 1.7B
12.6 tok/s
~6.5 GB 22.5s
OLMo-3-7B-Think *BF16 7B
15.4 tok/s
~14 GB 103s

* OLMo-3-7B runs W16A32 BF16 GPU path (v4.0.2/v4.0.8) — weights BF16 (14GB VRAM), activations FP32. 3.75× speedup vs prior FP32 CPU path.

ATLAS Observatory

Explore the memory palace in 3D, forge thoughts with live LLM inference, watch n-morphic evolution in the arena, and interact with 28 MCP tools — all from your browser.

🏛️
Palace
3D force-directed graph of the memory palace with pheromone flow particles, bloom lighting, and semantic fly-to navigation.
⚒️
Forge
Live chat with OLMo-3-7B-Think via SSE streaming. Token confidence visualization and real-time OODA loop display.
⚔️
Arena
Watch k=1, 2, 4 morphic populations compete in real-time. Branching events, fitness landscapes, and +38% diversity gains.
📚
Library
Interactive K↔L↔1/ρ sliders, λ decay charts, Fleming–Viot diagrams, and the full crate dependency tree.
🔧
Workshop
12 MCP tool cards with live execution, tree/graph result viewers, operation log, and local-first architecture.
🔭 Launch Observatory on Hugging Face

GPU-accelerated · 5 interactive tabs · 13,659 lines of code · Runs on A10G/L4

Release Roadmap

ATLAS evolves continuously. Major versions represent theoretical milestones, not just feature additions.

v4.0.8
Anti-Repetition Defaults + Think Budget ✓ RELEASED
Fixed degenerate think loops with proper sampling defaults. SamplingConfig::olmo3() v2 preset. Think budget force-closes runaway <think> blocks after 200 tokens. Extended API: top_k, min_p, repetition_window. 565/565 tests.
v4.0.7
OLMo Post-Norm + QK-Norm Fixes ✓ RELEASED
Fixed 3 critical OLMo-2/3 bugs: post-norm layer ordering (norm(output)+residual), per-head QK-norm weight slicing, CPU path QK-norm. CPU/GPU logit agreement within 0.000015. 15.4 tok/s A100 BF16. 562/562 tests.
v4.0.6
Sampling Controls ✓ RELEASED
SamplingConfig 7-stage pipeline: rep_penalty → freq → pres → temp → top_k → top_p → min_p → sample. OpenAI-compatible API parameters. SamplingConfig::olmo3() preset. 30/30 unique tokens verified on A100.
v4.0.5
EOS Stopping + PRNG + ChatML ✓ RELEASED
Issues #13–#15 closed. generate() EOS dead code fixed. XorShift64 deterministic PRNG for reproducible sampling. Correct ChatML template for OLMo-3. 549 tests.
v4.0.4
GPT-4 Regex Tokenizer ✓ RELEASED
Issue #12 closed. OLMo-3 tokenizer mismatch fixed — switched to GPT-4 regex pattern. Correct BPE merges. 539 tests.
v4.0.3
λ Exp Decay Fix + Competition ReLU ✓ RELEASED
Issue #11 closed. CanonicalPheromoneUpdate λ clamped via sigmoid to prevent negative decay. InvasionFitnessScorer τ=0.2 ReLU threshold. 532 tests.
v4.0.2
BF16 GPU Inference Path ✓ RELEASED
W16A32 BF16 path for OLMo-3-7B. sgemv_bf16_kernel. 4.1 → 19.9 tok/s (4.8× speedup). 224/224 BF16 matrices. 528 tests.
v4.0.1
OLMo-3-7B · SWA + YaRN ✓ RELEASED
Full OLMo-3-7B-Think inference with Sliding Window Attention and YaRN context scaling. 528 tests. GPU sanity verified on A100.
v4.0.0
Champagnat N-Morphic Framework ✓ RELEASED
5-part P1–P5 implementation: InvasionFitnessScorer, CanonicalPheromoneUpdate, BarBovier2017Constraints, CognitiveBranching, HJConcentrationPrior. 516 tests.
v3.0.0
GPU Kernels + OpenAI API ✓ RELEASED
Pure Rust CUDA kernels (rmsnorm, rope, silu_mul), GpuVec activation buffers, atlas-api OpenAI-compatible server with SSE streaming. 435 tests.
v4.1.x
Sprint 1 — v5 RFC ▶ IN PROGRESS
Four proposals from Issue #10 (v5 RFC) underway: P1 PolymorphicTrainer (LoRA adapter morphic switching) · P2 SingularityDetector (eigenvalue singularity gate) · P3 PunctuatedCurriculum (pheromone-triggered epoch transitions) · P9 XSdcIsaSpec (ISA spec for RISC-V SDP core — patent-critical, deadline Dec 6 2026).
ATLAS Observatory launching on Hugging Face Spaces. Google.org Grant deadline May 1.
v4.1.0
INT8 Quantization NEXT
INT8 weight quantization for Linear layers. Target: ~30+ tok/s on OLMo-3-7B (current HBM2e utilization 6% — theoretical ceiling 69 tok/s). GpuMatrix quantized dispatch path.
v5.0.0
ATLAS SDP Hardware Bridge PLANNED
Interface layer for the Stigmergic Dynamical Processor (SDP) FPGA prototype. k=4 parallel phenotypic streams on AMD Versal VCK190. Die estimate: ~1.40 mm² at TSMC 7nm. Provisional Patent #63/932,720 — conversion deadline Dec 6 2026.

Start building with ATLAS

Pure Rust. No Python. No ONNX. No vendor lock-in. Clone the repo, run the tests, and explore the memory palace.

Star on GitHub View Releases
git clone https://github.com/web3guru888/ATLAS && cd ATLAS && cargo test